Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-24 Thread Shai Berger
On Wednesday 23 March 2016 11:50:59 Hugo Chargois wrote:
> 
> ... MySQL's REPEATABLE READ may have
> it flaws, it may cause repeating bugs because that level is a bit awry,
> with its reads and writes that don't work the same, but all in all, it IS a
> higher isolation level than READ COMMITTED. It DOES provide the REPEATABLE
> READ guarantee (for reads) that READ COMMITTED doesn't have.
> 

As I have shown, this is only true under certain conditions.

> 
> But I still want to insist on the fact that the bug discussed in the ticket
> is quite independent from the choice of the default isolation level. Sure,
> setting a lesser default isolation level fixes it coincidentally, but it 
> can also be fixed (in two different ways, even) with a better use of the
> same isolation level. It shouldn't by itself justify the change of the
> default isolation level.

This is a sentiment I agree with (except for the "coincindental" 
qualification). The change of isolation level is intended to solve a whole 
class of bugs, most of them in user code; the bug that prompted this 
discussion is merely one example, and by itself wouldn't justify the change.

Shai.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-24 Thread Shai Berger
While there are some different opinions expressed here (mine included), the 
rough consensus seems to be to apply the for_update() patch to 1.8.x (it is 
neither needed nor applicable to 1.9.x) and change the transaction isolation 
level for 1.10 and going forward.

To the best of my knowledge, nobody is working on patches yet, I have had no 
time for it myself in the last few days.


On Thursday 24 March 2016 21:45:02 Ben Cail wrote:
> What's the current status on this? Is the .for_update() change mentioned
> in the bug report the best way to go? Is anyone working on a PR?
> 
> On 03/18/2016 05:15 PM, Karen Tracey wrote:
> > This is the 2nd major issue I can recall caused by MySQL default of
> > REPEATABLE READ transaction isolation level. I think Django should
> > simply switch itself to a default of using READ COMMITTED, consistent
> > with (all?) other supported database backends, and document how, if a
> > user really really wants to use REPEATABLE READ, they can do so (I
> > assume Django could allow that?), and what known problems when using
> > basic Django functions they may run into if they do so.
> > 
> > I fear our existing approach of documenting how certain functions
> > don't work by default on MySQL (e.g. get_or_create) is not really
> > helping the majority of our users. I believe switching instead to
> > making Django code itself work by default on MySQL would be a better
> > long-term solution for those who use MySQL with Django, and avoid
> > future cases like this one that has been discovered (years after we
> > knew get_or_create was broken by default transaction isolation level
> > on MySQL).
> > 
> > On Mon, Mar 14, 2016 at 11:15 AM, Tim Graham  > 
> > > wrote:
> > Could some MySQL users take a look at ticket #26347 [0] and
> > recommend how to proceed? I think it's probably not a new issue
> > but I'm a bit surprised it hasn't come up before if so.
> > 
> > [0] https://code.djangoproject.com/ticket/26347


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-24 Thread Ben Cail
What's the current status on this? Is the .for_update() change mentioned 
in the bug report the best way to go? Is anyone working on a PR?


On 03/18/2016 05:15 PM, Karen Tracey wrote:
This is the 2nd major issue I can recall caused by MySQL default of 
REPEATABLE READ transaction isolation level. I think Django should 
simply switch itself to a default of using READ COMMITTED, consistent 
with (all?) other supported database backends, and document how, if a 
user really really wants to use REPEATABLE READ, they can do so (I 
assume Django could allow that?), and what known problems when using 
basic Django functions they may run into if they do so.


I fear our existing approach of documenting how certain functions 
don't work by default on MySQL (e.g. get_or_create) is not really 
helping the majority of our users. I believe switching instead to 
making Django code itself work by default on MySQL would be a better 
long-term solution for those who use MySQL with Django, and avoid 
future cases like this one that has been discovered (years after we 
knew get_or_create was broken by default transaction isolation level 
on MySQL).


On Mon, Mar 14, 2016 at 11:15 AM, Tim Graham > wrote:


Could some MySQL users take a look at ticket #26347 [0] and
recommend how to proceed? I think it's probably not a new issue
but I'm a bit surprised it hasn't come up before if so.

[0] https://code.djangoproject.com/ticket/26347
-- 
You received this message because you are subscribed to the Google

Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to django-developers+unsubscr...@googlegroups.com
.
To post to this group, send email to
django-developers@googlegroups.com
.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit

https://groups.google.com/d/msgid/django-developers/286b0efb-673f-42d7-a1f3-5de76fc039c5%40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google 
Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to django-developers+unsubscr...@googlegroups.com 
.
To post to this group, send email to 
django-developers@googlegroups.com 
.

Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CACS9rae4U0e80-h%3DesTXFUi%3DLxWQ-XiMAp%3DAdkXcR0FnJVT2Cg%40mail.gmail.com 
.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/56F443BE.7080605%40brown.edu.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-23 Thread Hugo Chargois
Le mardi 22 mars 2016 23:33:10 UTC+1, Shai Berger a écrit :
> It is Django's job to try, as best as it can, to fulfill these 
expectations. 

How could I disagree? Of course, if there is one single sensible, obvious 
default, that would help 99.9 % of users, like your webserver analogy, it 
should be set.

> MySql's REPEATABLE READ is unsuitable for use as a default by Django. 

This is where things are not as obvious... MySQL's REPEATABLE READ may have 
it flaws, it may cause repeating bugs because that level is a bit awry, 
with its reads and writes that don't work the same, but all in all, it IS a 
higher isolation level than READ COMMITTED. It DOES provide the REPEATABLE 
READ guarantee (for reads) that READ COMMITTED doesn't have.

For each "DELETE-SELECT-INSERT" bug that happens "because of" REP READ, how 
many, I don't know, "SELECT-SELECT-again-and-not-have-the-same-results" 
bugs are prevented by it?

That would be hard to say for sure.

For that user-transaction behavior, I'm in favor of a documentation fix. On 
the "Transaction" documentation page, have a not that read something like 
"Each backend can be configured in multiple isolation levels, and the 
isolation guarantees differ by isolation level and also between backends. 
In particular, MySQL's REP READ is weird". Maybe offer a way to select the 
desired isolation level in code, for each transaction even. Caveat emptor. 
After all, apart from SERIALIZABLE, transactions will never be perfect.

But I still want to insist on the fact that the bug discussed in the ticket 
is quite independent from the choice of the default isolation level. Sure, 
setting a lesser default isolation level fixes it coincidentally, but it 
can also be fixed (in two different ways, even) with a better use of the 
same isolation level. It shouldn't by itself justify the change of the 
default isolation level.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/a2c9cfbe-97db-473c-9055-570fdb6a055f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-22 Thread Shai Berger
On Tuesday 22 March 2016 11:57:03 Hugo Chargois wrote:
> 
> I'm unsure that your relentless rant against MySQL is useful in this topic.

I'm sorry if anyone sees it as a rant. It is intended as a technical argument 
that MySql's REPEATABLE READ is unsuitable for use as a default by Django.

> 
> You're missing the point. First, that same "dance" in user code is nowhere
> near as much used as the one in Django internals. As a user, I've never
> ever used explicit Django transactions, at all. Have I saved models? Sure.
> Loads.

Most non-trivial Django projects tie transactions to requests, and don't worry 
about transactions beyond that. Their authors expect basic, limited 
consistency guarantees to be gained by that. It is Django's job to try, as 
best as it can, to fulfill these expectations.

> Second, the onus is Django's ORM to get its moves right. It's on me
> to get mine right. I know that. I accept that. As a user, that's probably
> the most important reason why I'm using an ORM. Because I don't know a lot
> about database transactions caveats, and therefore I trust the ORM to do
> transactions right and not eat my data. If someday I ever need to do tricky
> things and I need to do transactions myself, I will take a hard look on how
> transactions work on my database backend, and if I get it wrong, it will be
> my fault.

I'll answer this argument with an analogy. Imagine that we're back in a world 
where WSGI doesn't exist, and so Django needs to provide specific adapters to 
webservers. Further, a specific webserver -- let's call it "MyHttp" -- became 
very popular, and Django supports it. And then it turns out that, by default, 
if you try to send a page longer than 16KB on HTTPS, MyHttp instead sends a 
redirect to HTTP and serves the content unencrypted -- optimizing for using 
less server resources. A user can prevent that by using some API to declare 
their application as "sensitive". Consider also that, somehow, this 
optimization is not very well known. It is explained somewhere within MyHttp's 
documentation, but the generally known thing is that MyHttp supports HTTPS, 
although "sometimes there are issues with that".

Would you claim that it isn't Django's place to make applications "sensitive" 
by default? That if users employ MyHttp for secure Django applications, it is 
their responsibility to find out all about the limitations of their web server, 
and any expectations not based on reading its whole documentation are 
irresponsible, and Django shouldn't care?

I, for one, expect better of Django.

Shai.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-22 Thread Hugo Chargois
Le mardi 22 mars 2016 00:11:31 UTC+1, Shai Berger a écrit :
> I disagree. The ORM cannot keep you safe against MySql's REPEATABLE READ. 

>> Incidentally, to prove my point, 
>> this has been changed in Django 1.9 and data-loss doesn't happen 
anymore, 
>> in that same default isolation level. 
>> 
>
>That indeed seems to support your point, but it is far from proving it. 

I'm unsure that your relentless rant against MySQL is useful in this topic. 
You have to acknowledge that the data-loss behavior, as far as the bug that 
started this topic is concerned, is fixed in 1.9 by making better queries 
against the database. It is also fixed by your SELECT FOR UPDATE patch. The 
point I'm trying to make is not that the ORM should magically be perfect 
against all backend, however broken they may be, but that it should 
guarantee the integrity of data as much as is allowed by a intelligent use 
of the backend. Using SELECT FOR UPDATE or whatever is done in 1.9 is a 
smarter use of the MySQL backend.

What are you trying to prove here? That MySQL is broken? That may or may 
not be the case (probably not), but this is quite off-topic. Even if it 
doesn't adhere to the ISO/ANSI-standard for isolation levels behavior, 
there's not a lot you can do. MySQL is by far the most popular open-source 
DBMS. I hate it as much as you and would heartily prefer to use PostgreSQL, 
but that's a fact. There may be grounds for whining on MySQL bugtracker, 
but I'm not sure this would go anywhere. They probably won't care, by their 
popularity, they basically are the de facto SQL standard. And even if they 
care, they would be unwilling to break compatibility by introducing changes 
to how isolation levels work. And even if they do change the isolation 
levels, it would take some time for users to get that new MySQL version. In 
all cases, Django would still need fixing.

The reason for that, and the major point you seem to be missing, is that 
> Django 1.8's DELETE-SELECT-INSERT dance for updating M2M's is not reserved 
> to 
> Django's internals; you can be quite certain that there are a *lot* of 
> similar 
> examples in user code. And that user code, making assumptions that hold 
> everywhere except with MRR, are not buggy. It is MySql's documented 
> behavior 
> that is insensible. 
>

You're missing the point. First, that same "dance" in user code is nowhere 
near as much used as the one in Django internals. As a user, I've never 
ever used explicit Django transactions, at all. Have I saved models? Sure. 
Loads. Second, the onus is Django's ORM to get its moves right. It's on me 
to get mine right. I know that. I accept that. As a user, that's probably 
the most important reason why I'm using an ORM. Because I don't know a lot 
about database transactions caveats, and therefore I trust the ORM to do 
transactions right and not eat my data. If someday I ever need to do tricky 
things and I need to do transactions myself, I will take a hard look on how 
transactions work on my database backend, and if I get it wrong, it will be 
my fault.

Maybe a note on Django's "Database transactions" page would be helpful to 
warn users about the limitations of MySQL?

I can reproduce this. But frankly, I cannot understand it. Can you explain 
> what happens here? Point to some documentation? The way I understand it, 
> the 
> DELETE from session B either attempts to delete the pre-existing record 
> (in 
> which case, when it resumes after the lock, it should report "0 records 
> affected" -- that record has been already deleted) or the one inserted by 
> session A (in which case, the second SELECT in session B should retrieve 
> no 
> records). In either case, the SELECT in session B should not be finding 
> any 
> record which has already been deleted, because all deletions have either 
> been 
> committed (and we are in READ COMMITTED) or have been done in its own 
> transaction. What is going on here? 
>
> The point of the above question, besides academic interest, is that this 
> behavior is also suspect of being a bug. And if it is, Django should try 
> to 
> work around it and help users work around it, but not base its strategic 
> decisions on its existence. 
>

Sorry, I have no idea why this example behaves like that. I tried it, 
thinking "that could break, maybe", and it did, and that's it.
 

> Again: The burden of "finding operation sequences" on Django is relatively 
> small -- most ORM operations are single queries.
>

Great, then maybe this bug is one of the last of its kind :) And a manual 
search for others, by finding "maybe broken" multiple-queries operations, 
then testing them concurrently, then if need be optimizing/fixing them 
would be within reach.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.c

Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread Shai Berger
Hi,

On Monday 21 March 2016 13:01:27 hcharg...@medici.tv wrote:
> 
> I agree with Karen that documenting that some functions don't work is not
> helping. Especially since this is Django's fault, not MySQL's. I mean,
> sure, MySQL has some very weird isolation modes, I definitely agree with
> Shai on that, but an ORM's job is to know all that, and to use the database
> in such ways that it is safe. I reckon those safe ways exist,

I disagree. The ORM cannot keep you safe against MySql's REPEATABLE READ.

> and that the
> problem in Django is not the MySQL default isolation level, but the
> DELETE-SELECT-INSERT way of updating M2Ms.

The reason for that, and the major point you seem to be missing, is that 
Django 1.8's DELETE-SELECT-INSERT dance for updating M2M's is not reserved to 
Django's internals; you can be quite certain that there are a *lot* of similar 
examples in user code. And that user code, making assumptions that hold 
everywhere except with MRR, are not buggy. It is MySql's documented behavior 
that is insensible.

> Incidentally, to prove my point,
> this has been changed in Django 1.9 and data-loss doesn't happen anymore,
> in that same default isolation level.
> 

That indeed seems to support your point, but it is far from proving it.

It is trivial to show that MRR simply isn't repeatable read:

create table yoyo (id int, age int, rank int);
insert yoyo values(1,2,3);

Now, using MySql's so-called REPEATABLE READ,

SESSION A   SESSION B
==
SET autocommit=0;   SET autocommit=0;

SELECT * FROM yoyo; SELECT age FROM yoyo;
(1,2,3) (2)

UPDATE yoyo set age=5;  
(ok, 1 row affected)

UPDATE yoyo set rank=7;
(blocks)

COMMIT;
(ok)
(resumes)
(ok, 1 row affected)


SELECT age from yoyo;
(5)


Session B sees, in its reads, changes made by session A after B's transaction 
started. This is called, in the SQL standard, "a non-repeatable read". This is 
*exactly* the situation that REPEATABLE READ is required to prevent. Calling a 
transaction isolation level that allows this "REPEATABLE READ" is a lie. At 
the very least, MySql needs to change the name of this isolation level to 
something less misleading.

> It seems that setting a READ COMMITTED default level would help in that
> particular schedule of operations that is described in the ticket, but I
> disagree that it will work always and not cause problems elsewhere. For
> example, given a different schedule of the same operations needed to update
> a M2M, (DELETE-SELECT-INSERT), under READ COMMITTED, data-loss could still
> happen (see attachment).
> 

I can reproduce this. But frankly, I cannot understand it. Can you explain 
what happens here? Point to some documentation? The way I understand it, the 
DELETE from session B either attempts to delete the pre-existing record (in 
which case, when it resumes after the lock, it should report "0 records 
affected" -- that record has been already deleted) or the one inserted by 
session A (in which case, the second SELECT in session B should retrieve no 
records). In either case, the SELECT in session B should not be finding any 
record which has already been deleted, because all deletions have either been 
committed (and we are in READ COMMITTED) or have been done in its own 
transaction. What is going on here?

The point of the above question, besides academic interest, is that this 
behavior is also suspect of being a bug. And if it is, Django should try to 
work around it and help users work around it, but not base its strategic 
decisions on its existence.

> 
> In my opinion, the fix should not be to set a different default isolation
> level, as that could trigger other problems that may be very hard to find.
> I was lucky to find this one. I think that Django should find sequences of
> operations that are proven to be safe under specific isolation levels.

Again: The burden of "finding operation sequences" on Django is relatively 
small -- most ORM operations are single queries. Most of the burden created by 
MySql's REPEATABLE READ falls on users, who should not be expected to deal 
with such blatant violations of common standards and terminology.

Shai.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread Aymeric Augustin
On 21 Mar 2016, at 15:02, Hugo Chargois  wrote:
> 
> But here, we have "two non-conflicting updates (no-ops, even) causes data to 
> be lost". I bet no one would call this safe.

Yes, this one was clearly a bug (and, if I understand correctly, it was 
resolved by a not-directly-related change, originally considered a performance 
optimization).

What I meant, but failed to convey clearly, is: can we derive general and 
useful rules from this bug, or are we stuck with dealing with such problems on 
a case by case basis?

In theory, we could:

1. make a list of “atomic” ORM operations (create, update, delete, etc.) — 
things you write in one line and thus expect to complete consistently
2. figure out which operations incur multiple queries — concrete inheritance 
and many-to-many relationships come to mind
3. investigate what unexpected results could occur in the presence of 
concurrent queries — I have no idea how to do that exhaustively

Furthermore, I suspect the answer to 3. will be “anything could happen” at 
lower isolation levels and “that looks be safe” at higher isolation levels. I 
also suspect there’s little we could do to improve the situation at lower 
isolation levels, except if we have other instances of unoptimized M2M code. 
(Optimizing for performance means making fewer queries means improving chances 
of transactional correctness.)

In the end, “figure out where the ORM could do fewer queries” sounds like a 
easier plan than “figure out all possible sequences of queries and whether 
they’re safe”.

-- 
Aymeric.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/44DC4C9E-74FC-43A0-8394-F0035DED3970%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread Hugo Chargois
Le lundi 21 mars 2016 13:35:19 UTC+1, Aymeric Augustin a écrit :
>
>
> The problem is to determine what “safe” means here :-) I’m afraid this 
> won’t be possible in general (at least not in a remotely efficient fashion) 
> because Django inevitably depends on the guarantees of the underlying 
> database, and those guarantees are weak at lower isolation levels.
>

Sure, some of it is left to interpretation. There are some tricky cases, 
like two conflicting updates happening at the same time. Would safe mean 
that they should they both succeed, even if there would be a lost update? 
Or would it mean that one of should them fail? But here, we have "two 
non-conflicting updates (no-ops, even) causes data to be lost". I bet no 
one would call this safe.

The guarantees offered by the database may not be perfect, but my point is 
that those guarantees are used badly in Django. Consistent read is a 
"feature", it can be used to make things "safer", but in that bug, it's 
simply used incorrectly because of false assumptions. I reckon that given 
those same guarantees, and knowing how they may be different from backend 
to backend, they can be used to have a "safer" behavior than this.

That's also why I don't think lowering the isolation level is a good idea. 
The higher the isolation level, the more "guarantees" to build upon there 
are. This bug is an unfortunate case where an assumption that is true in a 
lower isolation level is not anymore in a higher one. That doesn't mean 
that the assumption was safe to make in the first place.

I applied the patch proposed by Shai (thanks!) in the ticket, that replaces 
the SELECT by a SELECT FOR UPDATE. This effectively fixes the bug. I think 
this is the right kind of fix. Fix the code, do not change the isolation 
level because it fixes the code by chance.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/08fed447-1ad9-4e38-a23c-e170d255cb07%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread Aymeric Augustin
Hello,

> On 21 Mar 2016, at 12:01, hcharg...@medici.tv wrote:
> 
> In my opinion, the fix should not be to set a different default isolation 
> level, as that could trigger other problems that may be very hard to find. I 
> was lucky to find this one. I think that Django should find sequences of 
> operations that are proven to be safe under specific isolation levels.

The problem is to determine what “safe” means here :-) I’m afraid this won’t be 
possible in general (at least not in a remotely efficient fashion) because 
Django inevitably depends on the guarantees of the underlying database, and 
those guarantees are weak at lower isolation levels.

-- 
Aymeric.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/6966EE96-8A48-403A-A39A-2B97BB88203F%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread Cristiano Coelho
Shai Berger, this explanation is pure gold! Definetely better than MySQL's 
one. 

Now I may agree on that changing the isolation level should be probably 
left to a major release, or either have a huge warning on the patch notes. 
I personally can't think of any project I have done that this would be an 
issue since database errors are usually well handled (which I think doing 
this change would increase, since you will be reading more modified data 
from other transactions if I read correct) and critical operations should 
be handled with either a select for update or transactional operations 
(such as doing col = col + 1).

El lunes, 21 de marzo de 2016, 4:27:59 (UTC-3), Shai Berger escribió:
>
> First of all, I would like to say that I strongly support the move to READ 
> COMITTED, including backporting it to 1.8.x. 
>
> But we also need to explain: REPEATABLE READ is a higher transaction 
> isolation 
> level than READ COMMITTED. If you have problematic code, it should lead to 
> more deadlocks and/or transactions failing at commit time (compared to 
> READ 
> COMMITTED), not to data loss. The reason we get data losses is MySql's 
> unique 
> interpretation of REPEATABLE READ. If you're interested in the details 
> (and if 
> you use MySql, you should be), read on. 
>
> With MySql's REPEATABLE READ, the "read" operations -- SELECT statements 
> -- 
> indeed act like they act in the usual REPEATABLE READ: Once you've read 
> some 
> table, changes made to that table by other transactions will not be 
> visible 
> within your transaction. But "write" operations -- UPDATE, DELETE, INSERT 
> and 
> the like -- act as if they're under READ COMMITTED, affecting (and 
> affected by) 
> changes committed by other transactions. The result is, essentially, that 
> within a transaction, the reads are not guaranteed to be consistent with 
> the 
> writes [1]. 
>
> In particular, in the bug[2] that caused this discussion, we get the 
> following   
> behavior in one transaction: 
>
> (1) BEGIN TRANSACTION 
>
> (2) SELECT ... FROM some_table WHERE some_field=some_value 
> (1 row returned) 
>
> (3) (some other transactions commit) 
>
> (4) SELECT ... FROM some_table WHERE some_field=some_value 
> (1 row returned, same as above) 
>
> (5) DELETE some_table WHERE some_field=some_value 
> (answer: 1 row deleted) 
>
> (6) SELECT ... FROM some_table WHERE some_field=some_value 
> (1 row returned, same as above) 
>
> (7) COMMIT 
> (the row that was returned earlier is no longer in the 
> database) 
>
> Take a minute to read this. Up to step (5), everything is as you would 
> expect; 
> you should find steps (6) and (7) quite surprising. 
>
> This happens because the other transactions in (3) deleted the row that is 
> returned in (2), (4) & (6), and inserted another one where 
> some_field=some_value; that other row is the row that was deleted in (5). 
> The 
> row that this transaction selects was not seen by the DELETE, and hence 
> not 
> changed by it, and hence continues to be visible by the SELECTs in our 
> transaction. But when we commit, the row (which has been deleted) no 
> longer 
> exists. 
>
> I have expressed elsewhere my opinion of this behavior as a general 
> database 
> feature, and feel no need to repeat it here; but I think that, if 
> possible, it 
> is Django's job as a framework to protect its users from it, at least as a 
> default. 
>
> On Monday 21 March 2016 02:25:37 Cristiano Coelho wrote: 
> > What performance changes can you expect doing this change? It is 
> probably 
> > that default on MySQL for a good reason. 
>
> The Django project is usually willing to give up quite a lot of 
> performance in 
> order to prevent data losses. I agree that this default on MySql is 
> probably 
> for a reason, but I don't think it can be a good reason for Django. 
>
> Have fun, 
> Shai. 
>
> [1] https://dev.mysql.com/doc/refman/5.7/en/innodb-consistent-read.html 
> [2] https://code.djangoproject.com/ticket/26347 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/9a0de040-4943-4a57-9153-312b30ee8524%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread hchargois
Hi,

I'm the ticket OP.

As a user, I definitely expect that my ORM won't eat my records when I save 
something, even with concurrency. Furthermore, I expect it to work with the 
default configuration of the backend, or if I have to change it, the ORM 
should refuse to work until the configuration is right, or at least display 
a huge warning.

I agree with Karen that documenting that some functions don't work is not 
helping. Especially since this is Django's fault, not MySQL's. I mean, 
sure, MySQL has some very weird isolation modes, I definitely agree with 
Shai on that, but an ORM's job is to know all that, and to use the database 
in such ways that it is safe. I reckon those safe ways exist, and that the 
problem in Django is not the MySQL default isolation level, but the 
DELETE-SELECT-INSERT way of updating M2Ms. Incidentally, to prove my point, 
this has been changed in Django 1.9 and data-loss doesn't happen anymore, 
in that same default isolation level.

It seems that setting a READ COMMITTED default level would help in that 
particular schedule of operations that is described in the ticket, but I 
disagree that it will work always and not cause problems elsewhere. For 
example, given a different schedule of the same operations needed to update 
a M2M, (DELETE-SELECT-INSERT), under READ COMMITTED, data-loss could still 
happen (see attachment).

Granted, this would be less of a problem, because it would (I think) 
require a very close synchronicity of the two sets of operations. This 
requirement is not needed in the ticket's case. If we refer to the 
step-by-step in the ticket, this isn't evident from it, but between the two 
SELECTs in sessions A and B, and the DELETE in session A, a "great" deal of 
time happens. This is because (if I understand correctly) a transaction is 
started to update each of the model's fields, and that is that single 
transaction that includes the update to the M2M field. Thus, the 
requirement is only that those two long transactions are synchronized.

By a matter of fact, this means that the bigger the model, the easier it 
will be to trigger the bug. I tried to replicate it in a minimal toy 
project, and I couldn't have enough synchronicity to replicate it (I only 
tried by double-clicking fast on the save button in the admin site, 
though). On my production project, I can reproduce the bug every time.

In my opinion, the fix should not be to set a different default isolation 
level, as that could trigger other problems that may be very hard to find. 
I was lucky to find this one. I think that Django should find sequences of 
operations that are proven to be safe under specific isolation levels. 
Those may need to be tailored for each backend. Maybe for PostgreSQL, that 
DELETE-SELECT-INSERT is safe, but it simply isn't in MySQL, neither in 
REPEATABLE READ nor READ COMMITTED. Thus this sequence could be used for 
PostgreSQL, but another one should be found for MySQL.

Le lundi 21 mars 2016 09:41:00 UTC+1, Shai Berger a écrit :
>
> My recommendation to backport is based on the observation that the 
> peculiar REPEATABLE READ behavior is highly conductive to data loss in the 
> presence of concurrency, combined with a sense that it is not very well 
> known; I find it much more likely that the change will fix broken code than 
> break really working code. 
>
> On 21 במרץ 2016 09:59:25 GMT+02:00, "Anssi Kääriäinen"  > wrote:
>>
>> I'm strongly -1 on changing the default isolation level in a minor
>> release. We can recommend users switch the level and complain loudly
>> if they don't. But just changing the isolation level has potential for
>> breaking working code.
>>
>>  - Anssi
>>
>> On Mon, Mar 21, 2016 at 9:27 AM, Shai Berger > > wrote:
>>
>>>  First of all, I would like to say that I strongly support the move to READ
>>>  COMITTED, including backporting it to 1.8.x.
>>>
>>>  But we also need to explain: REPEATABLE READ is a higher transaction 
>>> isolation
>>>  level than READ COMMITTED. If you have problematic code, it should lead to
>>>  more deadlocks and/or transactions failing at commit time (compared to READ
>>>  COMMITTED), not to data loss. The reason we get data losses is MySql's 
>>> unique
>>>  interpretation of REPEATABLE READ. If you're interested in the details 
>>> (and if
>>>  you use MySql, you should be), read on.
>>>
>>>  With MySql's REPEATABLE READ, the "read" operations -- SELECT statements --
>>>  indeed act like they act in the usual REPEATABLE READ: Once you've read 
>>> some
>>>  table, changes made to that table by other transactions will not be visible
>>>  within your transaction. But "write" operations -- UPDATE, DELETE, INSERT 
>>> and
>>>  the like -- act as if they're under READ COMMITTED, affecting (and 
>>> affected by)
>>>  changes committed by other transactions. The result is, essentially, that
>>>  within a transaction, the reads are not guaranteed to be consistent with 
>>> the
>>>  writes [1].
>>>
>>>  In particular, i

Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread Aymeric Augustin
We’ve had, and known about, such problems for so long that I’m also
inclined not to make that change in already released versions.

-- 
Aymeric.

> On 21 Mar 2016, at 12:33, Anssi Kääriäinen  wrote:
> 
> This being MySQL I wouldn't be surprised if changing the isolation
> level would introduce new problems. Also, user code might rely on
> Django using repeatable read. If we introduce the change in to stable
> releases, we risk breaking sites that work perfectly well currently.
> To me this is against our backwards compatibility rules which state
> that we avoid doing such changes except when completely unavoidable.
> 
> - Anssi
> 
> On Mon, Mar 21, 2016 at 10:40 AM, Shai Berger  wrote:
>> My recommendation to backport is based on the observation that the peculiar
>> REPEATABLE READ behavior is highly conductive to data loss in the presence
>> of concurrency, combined with a sense that it is not very well known; I find
>> it much more likely that the change will fix broken code than break really
>> working code.
>> 
>> On 21 במרץ 2016 09:59:25 GMT+02:00, "Anssi Kääriäinen" 
>> wrote:
>>> 
>>> I'm strongly -1 on changing the default isolation level in a minor
>>> release. We can recommend users switch the level and complain loudly
>>> if they don't. But just changing the isolation level has potential for
>>> breaking working code.
>>> 
>>> - Anssi
>>> 
>>> On Mon, Mar 21, 2016 at 9:27 AM, Shai Berger  wrote:
 
 First of all, I would like to say that I strongly support the move to
 READ
 COMITTED, including backporting it to 1.8.x.
 
 But we also need to explain: REPEATABLE READ is a higher transaction
 isolation
 level than READ COMMITTED. If you have problematic code, it should lead
 to
 more deadlocks and/or transactions failing at commit time (compared to
 READ
 COMMITTED), not to data loss. The reason we get data losses is MySql's
 unique
 interpretation of REPEATABLE READ. If you're interested in the details
 (and if
 you use MySql, you should be), read on.
 
 With MySql's REPEATABLE READ, the "read" operations -- SELECT statements
 --
 indeed act like they act in the usual REPEATABLE READ: Once you've read
 some
 table, changes made to that table by other transactions will not be
 visible
 within your transaction. But "write" operations -- UPDATE, DELETE,
 INSERT and
 the like -- act as if they're under READ COMMITTED, affecting (and
 affected by)
 changes committed by other transactions. The result is, essentially,
 that
 within a transaction, the reads are not guaranteed to be consistent with
 the
 writes [1].
 
 In particular, in the bug[2] that caused this discussion, we get the
 following
 behavior in one transaction:
 
 (1) BEGIN TRANSACTION
 
 (2) SELECT ... FROM some_table WHERE some_field=some_value
 (1 row returned)
 
 (3) (some other transactions commit)
 
 (4) SELECT ... FROM some_table WHERE some_field=some_value
 (1 row returned, same as above)
 
 (5) DELETE some_table WHERE some_field=some_value
 (answer: 1 row deleted)
 
 (6) SELECT ... FROM some_table WHERE some_field=some_value
 (1 row returned, same as above)
 
 (7) COMMIT
 (the row that was returned earlier is no longer in the
 database)
 
 Take a minute to read this. Up to step (5), everything is as you would
 expect;
 you should find steps (6) and (7) quite surprising.
 
 This happens because the other transactions in (3) deleted the row that
 is
 returned in (2), (4) & (6), and inserted another one where
 some_field=some_value; that other row is the row that was deleted in
 (5). The
 row that this transaction selects was not seen by the DELETE, and hence
 not
 changed by it, and hence continues to be visible by the SELECTs in our
 transaction. But when we commit, the row (which has been deleted) no
 longer
 exists.
 
 I have expressed elsewhere my opinion of this behavior as a general
 database
 feature, and feel no need to repeat it here; but I think that, if
 possible, it
 is Django's job as a framework to protect its users from it, at least as
 a
 default.
 
 On Monday 21 March 2016 02:25:37 Cristiano Coelho wrote:
> 
> What performance changes can you expect doing this change? It is
> probably
> that default on MySQL for a good reason.
 
 
 The Django project is usually willing to give up quite a lot of
 performance in
 order to prevent data losses. I agree that this default on MySql is
 probably
 for a reason, but I don't think it can be a good reason for Django.
 
 Have fun,
 Shai.
 
 [1] https://d

Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread Anssi Kääriäinen
This being MySQL I wouldn't be surprised if changing the isolation
level would introduce new problems. Also, user code might rely on
Django using repeatable read. If we introduce the change in to stable
releases, we risk breaking sites that work perfectly well currently.
To me this is against our backwards compatibility rules which state
that we avoid doing such changes except when completely unavoidable.

 - Anssi

On Mon, Mar 21, 2016 at 10:40 AM, Shai Berger  wrote:
> My recommendation to backport is based on the observation that the peculiar
> REPEATABLE READ behavior is highly conductive to data loss in the presence
> of concurrency, combined with a sense that it is not very well known; I find
> it much more likely that the change will fix broken code than break really
> working code.
>
> On 21 במרץ 2016 09:59:25 GMT+02:00, "Anssi Kääriäinen" 
> wrote:
>>
>> I'm strongly -1 on changing the default isolation level in a minor
>> release. We can recommend users switch the level and complain loudly
>> if they don't. But just changing the isolation level has potential for
>> breaking working code.
>>
>>  - Anssi
>>
>> On Mon, Mar 21, 2016 at 9:27 AM, Shai Berger  wrote:
>>>
>>>  First of all, I would like to say that I strongly support the move to
>>> READ
>>>  COMITTED, including backporting it to 1.8.x.
>>>
>>>  But we also need to explain: REPEATABLE READ is a higher transaction
>>> isolation
>>>  level than READ COMMITTED. If you have problematic code, it should lead
>>> to
>>>  more deadlocks and/or transactions failing at commit time (compared to
>>> READ
>>>  COMMITTED), not to data loss. The reason we get data losses is MySql's
>>> unique
>>>  interpretation of REPEATABLE READ. If you're interested in the details
>>> (and if
>>>  you use MySql, you should be), read on.
>>>
>>>  With MySql's REPEATABLE READ, the "read" operations -- SELECT statements
>>> --
>>>  indeed act like they act in the usual REPEATABLE READ: Once you've read
>>> some
>>>  table, changes made to that table by other transactions will not be
>>> visible
>>>  within your transaction. But "write" operations -- UPDATE, DELETE,
>>> INSERT and
>>>  the like -- act as if they're under READ COMMITTED, affecting (and
>>> affected by)
>>>  changes committed by other transactions. The result is, essentially,
>>> that
>>>  within a transaction, the reads are not guaranteed to be consistent with
>>> the
>>>  writes [1].
>>>
>>>  In particular, in the bug[2] that caused this discussion, we get the
>>> following
>>>  behavior in one transaction:
>>>
>>>  (1) BEGIN TRANSACTION
>>>
>>>  (2) SELECT ... FROM some_table WHERE some_field=some_value
>>>  (1 row returned)
>>>
>>>  (3) (some other transactions commit)
>>>
>>>  (4) SELECT ... FROM some_table WHERE some_field=some_value
>>>  (1 row returned, same as above)
>>>
>>>  (5) DELETE some_table WHERE some_field=some_value
>>>  (answer: 1 row deleted)
>>>
>>>  (6) SELECT ... FROM some_table WHERE some_field=some_value
>>>  (1 row returned, same as above)
>>>
>>>  (7) COMMIT
>>>  (the row that was returned earlier is no longer in the
>>> database)
>>>
>>>  Take a minute to read this. Up to step (5), everything is as you would
>>> expect;
>>>  you should find steps (6) and (7) quite surprising.
>>>
>>>  This happens because the other transactions in (3) deleted the row that
>>> is
>>>  returned in (2), (4) & (6), and inserted another one where
>>>  some_field=some_value; that other row is the row that was deleted in
>>> (5). The
>>>  row that this transaction selects was not seen by the DELETE, and hence
>>> not
>>>  changed by it, and hence continues to be visible by the SELECTs in our
>>>  transaction. But when we commit, the row (which has been deleted) no
>>> longer
>>>  exists.
>>>
>>>  I have expressed elsewhere my opinion of this behavior as a general
>>> database
>>>  feature, and feel no need to repeat it here; but I think that, if
>>> possible, it
>>>  is Django's job as a framework to protect its users from it, at least as
>>> a
>>>  default.
>>>
>>>  On Monday 21 March 2016 02:25:37 Cristiano Coelho wrote:

  What performance changes can you expect doing this change? It is
 probably
  that default on MySQL for a good reason.
>>>
>>>
>>>  The Django project is usually willing to give up quite a lot of
>>> performance in
>>>  order to prevent data losses. I agree that this default on MySql is
>>> probably
>>>  for a reason, but I don't think it can be a good reason for Django.
>>>
>>>  Have fun,
>>>  Shai.
>>>
>>>  [1] https://dev.mysql.com/doc/refman/5.7/en/innodb-consistent-read.html
>>>  [2] https://code.djangoproject.com/ticket/26347
>
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributi

Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread Shai Berger
My recommendation to backport is based on the observation that the peculiar 
REPEATABLE READ behavior is highly conductive to data loss in the presence of 
concurrency, combined with a sense that it is not very well known; I find it 
much more likely that the change will fix broken code than break really working 
code. 

On 21 במרץ 2016 09:59:25 GMT+02:00, "Anssi Kääriäinen"  
wrote:
>I'm strongly -1 on changing the default isolation level in a minor
>release. We can recommend users switch the level and complain loudly
>if they don't. But just changing the isolation level has potential for
>breaking working code.
>
> - Anssi
>
>On Mon, Mar 21, 2016 at 9:27 AM, Shai Berger  wrote:
>> First of all, I would like to say that I strongly support the move to
>READ
>> COMITTED, including backporting it to 1.8.x.
>>
>> But we also need to explain: REPEATABLE READ is a higher transaction
>isolation
>> level than READ COMMITTED. If you have problematic code, it should
>lead to
>> more deadlocks and/or transactions failing at commit time (compared
>to READ
>> COMMITTED), not to data loss. The reason we get data losses is
>MySql's unique
>> interpretation of REPEATABLE READ. If you're interested in the
>details (and if
>> you use MySql, you should be), read on.
>>
>> With MySql's REPEATABLE READ, the "read" operations -- SELECT
>statements --
>> indeed act like they act in the usual REPEATABLE READ: Once you've
>read some
>> table, changes made to that table by other transactions will not be
>visible
>> within your transaction. But "write" operations -- UPDATE, DELETE,
>INSERT and
>> the like -- act as if they're under READ COMMITTED, affecting (and
>affected by)
>> changes committed by other transactions. The result is, essentially,
>that
>> within a transaction, the reads are not guaranteed to be consistent
>with the
>> writes [1].
>>
>> In particular, in the bug[2] that caused this discussion, we get the
>following
>> behavior in one transaction:
>>
>> (1) BEGIN TRANSACTION
>>
>> (2) SELECT ... FROM some_table WHERE some_field=some_value
>> (1 row returned)
>>
>> (3) (some other transactions commit)
>>
>> (4) SELECT ... FROM some_table WHERE some_field=some_value
>> (1 row returned, same as above)
>>
>> (5) DELETE some_table WHERE some_field=some_value
>> (answer: 1 row deleted)
>>
>> (6) SELECT ... FROM some_table WHERE some_field=some_value
>> (1 row returned, same as above)
>>
>> (7) COMMIT
>> (the row that was returned earlier is no longer in
>the database)
>>
>> Take a minute to read this. Up to step (5), everything is as you
>would expect;
>> you should find steps (6) and (7) quite surprising.
>>
>> This happens because the other transactions in (3) deleted the row
>that is
>> returned in (2), (4) & (6), and inserted another one where
>> some_field=some_value; that other row is the row that was deleted in
>(5). The
>> row that this transaction selects was not seen by the DELETE, and
>hence not
>> changed by it, and hence continues to be visible by the SELECTs in
>our
>> transaction. But when we commit, the row (which has been deleted) no
>longer
>> exists.
>>
>> I have expressed elsewhere my opinion of this behavior as a general
>database
>> feature, and feel no need to repeat it here; but I think that, if
>possible, it
>> is Django's job as a framework to protect its users from it, at least
>as a
>> default.
>>
>> On Monday 21 March 2016 02:25:37 Cristiano Coelho wrote:
>>> What performance changes can you expect doing this change? It is
>probably
>>> that default on MySQL for a good reason.
>>
>> The Django project is usually willing to give up quite a lot of
>performance in
>> order to prevent data losses. I agree that this default on MySql is
>probably
>> for a reason, but I don't think it can be a good reason for Django.
>>
>> Have fun,
>> Shai.
>>
>> [1]
>https://dev.mysql.com/doc/refman/5.7/en/innodb-consistent-read.html
>> [2] https://code.djangoproject.com/ticket/26347
>
>-- 
>You received this message because you are subscribed to the Google
>Groups "Django developers  (Contributions to Django itself)" group.
>To unsubscribe from this group and stop receiving emails from it, send
>an email to django-developers+unsubscr...@googlegroups.com.
>To post to this group, send email to
>django-developers@googlegroups.com.
>Visit this group at https://groups.google.com/group/django-developers.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/django-developers/CALMtK1EOwq7CC2bF6nL7-GuHPv_RJ_cf_giCtchtk2gzYYMr8g%40mail.gmail.com.
>For more options, visit https://groups.google.com/d/optout.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop rec

Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread Anssi Kääriäinen
I'm strongly -1 on changing the default isolation level in a minor
release. We can recommend users switch the level and complain loudly
if they don't. But just changing the isolation level has potential for
breaking working code.

 - Anssi

On Mon, Mar 21, 2016 at 9:27 AM, Shai Berger  wrote:
> First of all, I would like to say that I strongly support the move to READ
> COMITTED, including backporting it to 1.8.x.
>
> But we also need to explain: REPEATABLE READ is a higher transaction isolation
> level than READ COMMITTED. If you have problematic code, it should lead to
> more deadlocks and/or transactions failing at commit time (compared to READ
> COMMITTED), not to data loss. The reason we get data losses is MySql's unique
> interpretation of REPEATABLE READ. If you're interested in the details (and if
> you use MySql, you should be), read on.
>
> With MySql's REPEATABLE READ, the "read" operations -- SELECT statements --
> indeed act like they act in the usual REPEATABLE READ: Once you've read some
> table, changes made to that table by other transactions will not be visible
> within your transaction. But "write" operations -- UPDATE, DELETE, INSERT and
> the like -- act as if they're under READ COMMITTED, affecting (and affected 
> by)
> changes committed by other transactions. The result is, essentially, that
> within a transaction, the reads are not guaranteed to be consistent with the
> writes [1].
>
> In particular, in the bug[2] that caused this discussion, we get the following
> behavior in one transaction:
>
> (1) BEGIN TRANSACTION
>
> (2) SELECT ... FROM some_table WHERE some_field=some_value
> (1 row returned)
>
> (3) (some other transactions commit)
>
> (4) SELECT ... FROM some_table WHERE some_field=some_value
> (1 row returned, same as above)
>
> (5) DELETE some_table WHERE some_field=some_value
> (answer: 1 row deleted)
>
> (6) SELECT ... FROM some_table WHERE some_field=some_value
> (1 row returned, same as above)
>
> (7) COMMIT
> (the row that was returned earlier is no longer in the 
> database)
>
> Take a minute to read this. Up to step (5), everything is as you would expect;
> you should find steps (6) and (7) quite surprising.
>
> This happens because the other transactions in (3) deleted the row that is
> returned in (2), (4) & (6), and inserted another one where
> some_field=some_value; that other row is the row that was deleted in (5). The
> row that this transaction selects was not seen by the DELETE, and hence not
> changed by it, and hence continues to be visible by the SELECTs in our
> transaction. But when we commit, the row (which has been deleted) no longer
> exists.
>
> I have expressed elsewhere my opinion of this behavior as a general database
> feature, and feel no need to repeat it here; but I think that, if possible, it
> is Django's job as a framework to protect its users from it, at least as a
> default.
>
> On Monday 21 March 2016 02:25:37 Cristiano Coelho wrote:
>> What performance changes can you expect doing this change? It is probably
>> that default on MySQL for a good reason.
>
> The Django project is usually willing to give up quite a lot of performance in
> order to prevent data losses. I agree that this default on MySql is probably
> for a reason, but I don't think it can be a good reason for Django.
>
> Have fun,
> Shai.
>
> [1] https://dev.mysql.com/doc/refman/5.7/en/innodb-consistent-read.html
> [2] https://code.djangoproject.com/ticket/26347

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CALMtK1EOwq7CC2bF6nL7-GuHPv_RJ_cf_giCtchtk2gzYYMr8g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread Aymeric Augustin
On 21 Mar 2016, at 01:25, Cristiano Coelho  wrote:
> 
> What performance changes can you expect doing this change? It is probably 
> that default on MySQL for a good reason.

Barring implementation weirdness, reducing the isolation level is supposed to 
improve performance (if it makes a difference at all).

-- 
Aymeric.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/651A7CF7-D982-4584-990F-071DDCF2FFA8%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-21 Thread Shai Berger
First of all, I would like to say that I strongly support the move to READ 
COMITTED, including backporting it to 1.8.x.

But we also need to explain: REPEATABLE READ is a higher transaction isolation 
level than READ COMMITTED. If you have problematic code, it should lead to 
more deadlocks and/or transactions failing at commit time (compared to READ 
COMMITTED), not to data loss. The reason we get data losses is MySql's unique 
interpretation of REPEATABLE READ. If you're interested in the details (and if 
you use MySql, you should be), read on.

With MySql's REPEATABLE READ, the "read" operations -- SELECT statements -- 
indeed act like they act in the usual REPEATABLE READ: Once you've read some 
table, changes made to that table by other transactions will not be visible 
within your transaction. But "write" operations -- UPDATE, DELETE, INSERT and 
the like -- act as if they're under READ COMMITTED, affecting (and affected by) 
changes committed by other transactions. The result is, essentially, that 
within a transaction, the reads are not guaranteed to be consistent with the 
writes [1].

In particular, in the bug[2] that caused this discussion, we get the following  
behavior in one transaction:

(1) BEGIN TRANSACTION

(2) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned)

(3) (some other transactions commit)

(4) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned, same as above)

(5) DELETE some_table WHERE some_field=some_value
(answer: 1 row deleted)

(6) SELECT ... FROM some_table WHERE some_field=some_value
(1 row returned, same as above)

(7) COMMIT
(the row that was returned earlier is no longer in the database)

Take a minute to read this. Up to step (5), everything is as you would expect; 
you should find steps (6) and (7) quite surprising.

This happens because the other transactions in (3) deleted the row that is 
returned in (2), (4) & (6), and inserted another one where 
some_field=some_value; that other row is the row that was deleted in (5). The 
row that this transaction selects was not seen by the DELETE, and hence not 
changed by it, and hence continues to be visible by the SELECTs in our 
transaction. But when we commit, the row (which has been deleted) no longer 
exists.

I have expressed elsewhere my opinion of this behavior as a general database 
feature, and feel no need to repeat it here; but I think that, if possible, it 
is Django's job as a framework to protect its users from it, at least as a 
default.

On Monday 21 March 2016 02:25:37 Cristiano Coelho wrote:
> What performance changes can you expect doing this change? It is probably
> that default on MySQL for a good reason.

The Django project is usually willing to give up quite a lot of performance in 
order to prevent data losses. I agree that this default on MySql is probably 
for a reason, but I don't think it can be a good reason for Django.

Have fun,
Shai.

[1] https://dev.mysql.com/doc/refman/5.7/en/innodb-consistent-read.html
[2] https://code.djangoproject.com/ticket/26347


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-20 Thread Cristiano Coelho
What performance changes can you expect doing this change? It is probably 
that default on MySQL for a good reason.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8fe33fc0-5a55-479b-99b4-fbebc3bcf128%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-20 Thread Curtis Maloney

+1 for me, too...  this aligns with "safe/secure by default".

--
C

On 21/03/16 09:13, Aymeric Augustin wrote:

I agree with Karen.

--
Aymeric.


On 18 Mar 2016, at 22:15, Karen Tracey mailto:kmtra...@gmail.com>> wrote:

This is the 2nd major issue I can recall caused by MySQL default of
REPEATABLE READ transaction isolation level. I think Django should
simply switch itself to a default of using READ COMMITTED, consistent
with (all?) other supported database backends, and document how, if a
user really really wants to use REPEATABLE READ, they can do so (I
assume Django could allow that?), and what known problems when using
basic Django functions they may run into if they do so.

I fear our existing approach of documenting how certain functions
don't work by default on MySQL (e.g. get_or_create) is not really
helping the majority of our users. I believe switching instead to
making Django code itself work by default on MySQL would be a better
long-term solution for those who use MySQL with Django, and avoid
future cases like this one that has been discovered (years after we
knew get_or_create was broken by default transaction isolation level
on MySQL).

On Mon, Mar 14, 2016 at 11:15 AM, Tim Graham mailto:timogra...@gmail.com>> wrote:

Could some MySQL users take a look at ticket #26347 [0] and
recommend how to proceed? I think it's probably not a new issue
but I'm a bit surprised it hasn't come up before if so.

[0] https://code.djangoproject.com/ticket/26347

--
You received this message because you are subscribed to the Google
Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to django-developers+unsubscr...@googlegroups.com
.
To post to this group, send email to
django-developers@googlegroups.com
.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit

https://groups.google.com/d/msgid/django-developers/286b0efb-673f-42d7-a1f3-5de76fc039c5%40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.



--
You received this message because you are subscribed to the Google
Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to django-developers+unsubscr...@googlegroups.com
.
To post to this group, send email to
django-developers@googlegroups.com
.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/CACS9rae4U0e80-h%3DesTXFUi%3DLxWQ-XiMAp%3DAdkXcR0FnJVT2Cg%40mail.gmail.com
.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google
Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to django-developers+unsubscr...@googlegroups.com
.
To post to this group, send email to django-developers@googlegroups.com
.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/176B63AA-95D8-4D8B-B8EB-6A0AF95EEB9B%40polytechnique.org
.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "Django 
developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/56EF311B.701%40tinbrain.net.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-20 Thread Aymeric Augustin
I agree with Karen.

-- 
Aymeric.

> On 18 Mar 2016, at 22:15, Karen Tracey  wrote:
> 
> This is the 2nd major issue I can recall caused by MySQL default of 
> REPEATABLE READ transaction isolation level. I think Django should simply 
> switch itself to a default of using READ COMMITTED, consistent with (all?) 
> other supported database backends, and document how, if a user really really 
> wants to use REPEATABLE READ, they can do so (I assume Django could allow 
> that?), and what known problems when using basic Django functions they may 
> run into if they do so.
> 
> I fear our existing approach of documenting how certain functions don't work 
> by default on MySQL (e.g. get_or_create) is not really helping the majority 
> of our users. I believe switching instead to making Django code itself work 
> by default on MySQL would be a better long-term solution for those who use 
> MySQL with Django, and avoid future cases like this one that has been 
> discovered (years after we knew get_or_create was broken by default 
> transaction isolation level on MySQL).
> 
> On Mon, Mar 14, 2016 at 11:15 AM, Tim Graham  > wrote:
> Could some MySQL users take a look at ticket #26347 [0] and recommend how to 
> proceed? I think it's probably not a new issue but I'm a bit surprised it 
> hasn't come up before if so.
> 
> [0] https://code.djangoproject.com/ticket/26347 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com 
> .
> To post to this group, send email to django-developers@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/django-developers 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/286b0efb-673f-42d7-a1f3-5de76fc039c5%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com 
> .
> To post to this group, send email to django-developers@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/django-developers 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/CACS9rae4U0e80-h%3DesTXFUi%3DLxWQ-XiMAp%3DAdkXcR0FnJVT2Cg%40mail.gmail.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/176B63AA-95D8-4D8B-B8EB-6A0AF95EEB9B%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-19 Thread Karen Tracey
This is the 2nd major issue I can recall caused by MySQL default of
REPEATABLE READ transaction isolation level. I think Django should simply
switch itself to a default of using READ COMMITTED, consistent with (all?)
other supported database backends, and document how, if a user really
really wants to use REPEATABLE READ, they can do so (I assume Django could
allow that?), and what known problems when using basic Django functions
they may run into if they do so.

I fear our existing approach of documenting how certain functions don't
work by default on MySQL (e.g. get_or_create) is not really helping the
majority of our users. I believe switching instead to making Django code
itself work by default on MySQL would be a better long-term solution for
those who use MySQL with Django, and avoid future cases like this one that
has been discovered (years after we knew get_or_create was broken by
default transaction isolation level on MySQL).

On Mon, Mar 14, 2016 at 11:15 AM, Tim Graham  wrote:

> Could some MySQL users take a look at ticket #26347 [0] and recommend how
> to proceed? I think it's probably not a new issue but I'm a bit surprised
> it hasn't come up before if so.
>
> [0] https://code.djangoproject.com/ticket/26347
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/286b0efb-673f-42d7-a1f3-5de76fc039c5%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CACS9rae4U0e80-h%3DesTXFUi%3DLxWQ-XiMAp%3DAdkXcR0FnJVT2Cg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-14 Thread Cristiano Coelho
I can't tell if it is a bug on MySQL or not, but I did understand the same 
as you (the first example with Session A and Session B makes it more clear) 
so I can't quite understand how did the poster get that issue. I would like 
a similar example as the one in the docs, but with a delete in the middle, 
it is a bit unclear what happens, could a delete in the middle create a new 
"snapshot" for your next select?


So assuming the issue is real and not a bug in MySQL:

- Would a DELETE + INSERT IGNORE fix the issue, while also improving 
many-to-many updates to backends that support INSERT IGNORE? (one less 
select)

- Would a different isolation level help? What about SERIALIZABLE? In order 
to do this, the only good way to do this is to have the option to change 
serialization level on each transaction, improving the transaction.atomic() 
method so we do not affect other db operations changing the isolation level 
globally.

- Force a lock to the many-to-many updates (using select for update maybe?) 
?

- Use the object history to perform date matching and detect concurrent 
modifications of the same object before we do them and raise an error?


El lunes, 14 de marzo de 2016, 20:12:29 (UTC-3), Shai Berger escribió:
>
> Hi, 
>
> I just commented on the ticket, but wanted to clarify a few things here: 
>
> On Tuesday 15 March 2016 00:48:02 Cristiano Coelho wrote: 
> > The django-admin interface is quite bad at handling concurrent 
> > modifications, this is one problem that might not happen on other 
> backends 
> > and is quite critical, but other issues (that ain't critical like data 
> loss 
> > but might cause unexpected errors) like two people viewing a model with 
> > some inline relations, one of them deletes one of the inline rows, the 
> > other one performs a save, it will try to re-save the one that's 
> actually 
> > deleted and you end up with a "please correct the errors below" but you 
> > actually see no errors (because the row you tried to save no longer 
> exists 
> > and is no longer loaded). Similar errors happen with other kind of 
> > concurrent modifications. 
>
> Right. 
>   
> > As of the MySQL issue, using a higher isolation level might fix the 
> issue 
>
> For the record, the suggestion made was to use a *lower* isolation level, 
> to 
> prevent the failure of transactions; I disagree with that recommendation, 
> as 
> it "prevents" the failure by enabling the kinds of data corruption you 
> described above. 
>
> > but I would strongly recommend not changing it as this is mainly an 
> issue 
> > of how django performs the many-to-many save and not MySQL's fault at 
> all. 
>
> ... and there I completely disagree. Django handles the updates in a 
> transaction; first it deletes records, then (if they don't exist) writes 
> them. 
> The MySql docs explain that records are "snapshotted" at the beginning of 
> the 
> transaction, and so may actually be stale later on in it -- but that does 
> not 
> hold when the changes to the records are done *in* the transaction. 
> I asked the ticket's OP to describe a scenario where this is anything but 
> a 
> severe MySql bug, and I ask the same of you. 
>
> Shai. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1df2641a-cdd9-49b9-b3fa-56dbe707fd8d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-14 Thread Shai Berger
Hi,

I just commented on the ticket, but wanted to clarify a few things here:

On Tuesday 15 March 2016 00:48:02 Cristiano Coelho wrote:
> The django-admin interface is quite bad at handling concurrent
> modifications, this is one problem that might not happen on other backends
> and is quite critical, but other issues (that ain't critical like data loss
> but might cause unexpected errors) like two people viewing a model with
> some inline relations, one of them deletes one of the inline rows, the
> other one performs a save, it will try to re-save the one that's actually
> deleted and you end up with a "please correct the errors below" but you
> actually see no errors (because the row you tried to save no longer exists
> and is no longer loaded). Similar errors happen with other kind of
> concurrent modifications.

Right.
 
> As of the MySQL issue, using a higher isolation level might fix the issue

For the record, the suggestion made was to use a *lower* isolation level, to 
prevent the failure of transactions; I disagree with that recommendation, as 
it "prevents" the failure by enabling the kinds of data corruption you 
described above.

> but I would strongly recommend not changing it as this is mainly an issue
> of how django performs the many-to-many save and not MySQL's fault at all.

... and there I completely disagree. Django handles the updates in a 
transaction; first it deletes records, then (if they don't exist) writes them. 
The MySql docs explain that records are "snapshotted" at the beginning of the 
transaction, and so may actually be stale later on in it -- but that does not 
hold when the changes to the records are done *in* the transaction.
I asked the ticket's OP to describe a scenario where this is anything but a 
severe MySql bug, and I ask the same of you.

Shai.


Re: MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-14 Thread Cristiano Coelho
The django-admin interface is quite bad at handling concurrent 
modifications, this is one problem that might not happen on other backends 
and is quite critical, but other issues (that ain't critical like data loss 
but might cause unexpected errors) like two people viewing a model with 
some inline relations, one of them deletes one of the inline rows, the 
other one performs a save, it will try to re-save the one that's actually 
deleted and you end up with a "please correct the errors below" but you 
actually see no errors (because the row you tried to save no longer exists 
and is no longer loaded). Similar errors happen with other kind of 
concurrent modifications.

As of the MySQL issue, using a higher isolation level might fix the issue 
but I would strongly recommend not changing it as this is mainly an issue 
of how django performs the many-to-many save and not MySQL's fault at all. 
If you still go this path, it would be a good idea to have the isolation 
level change not set globally (on settings) but have it as an argument to 
the transaction.atomic() function to be able to change it per transaction.

Would it work if rather than performing a DELETE, SELECT, INSERT combo, you 
perform a DELETE and INSERT IGNORE ? I believe it is supported by MySQL and 
PostgreSQL has a similar functionallity starting on 9.5 but can be emulated 
through locks or nested queries.

What if there could be a way to store the last modified date of an object 
(check it's history maybe?), save that date locally on the admin form, and 
when performing the actual save compare those two and prompt an error if 
they do not match, that would prevent most concurrent modification issues 
(although make it a bit more anoying for edits that usually do not have any 
issue), although you might still be prone to a small race condition on the 
modification date, unless yo uperform a select for update on it first.



El lunes, 14 de marzo de 2016, 12:15:31 (UTC-3), Tim Graham escribió:
>
> Could some MySQL users take a look at ticket #26347 [0] and recommend how 
> to proceed? I think it's probably not a new issue but I'm a bit surprised 
> it hasn't come up before if so.
>
> [0] https://code.djangoproject.com/ticket/26347
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/01264edf-73db-4533-8210-bc35448029a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


MySQL data loss possibility with concurrent ManyToManyField saves

2016-03-14 Thread Tim Graham
Could some MySQL users take a look at ticket #26347 [0] and recommend how 
to proceed? I think it's probably not a new issue but I'm a bit surprised 
it hasn't come up before if so.

[0] https://code.djangoproject.com/ticket/26347

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/286b0efb-673f-42d7-a1f3-5de76fc039c5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.