Re: two-phase commit / distributed transaction

2016-12-02 Thread Mateusz Mikołajczyk
If anybody is interested, I created a proof of concept code for PostgreSQL 
which extends existing Atomic context:

https://github.com/toudi/django-tpc-proof-of-concept

the actual implementation can be found inside tpc/atomic_tpc.py and there 
are two commands available:

./manage.py prepare
./manage.py commit

if atomic() would have been extendible (i.e. if one could replace 
connection.commit() with something else ) then this whole lot of 
monkeypatching wouldn't be necesary.

What do you guys think? Also, has nobody ever written stuff like this? Or 
am searching the wrong way?

cheers,
toudi

W dniu piątek, 2 grudnia 2016 22:02:44 UTC+1 użytkownik Aymeric Augustin 
napisał:
>
> Hello,
>
> To be honest I’m pessimistic about the feasibility of emulating 
> transactional behavior — pretty much the most complicated and low level 
> thing databases do — in the application. I don’t think that would be 
> considered suitable for Django.
>
> Usually Django handles such cases with a database feature flag and make 
> the methods no-ops on databases that don’t support the corresponding 
> features. For instance that’s how Django ignores transactions on MySQL + 
> MyISAM.
>
> Best regards,
>
> -- 
> Aymeric.
>
> On 2 Dec 2016, at 20:43, Mateusz Mikołajczyk  > wrote:
>
> Well, I suppose that it would either lead to very obfuscated 
> implementation code, or very weird syntax (client code). As for your first 
> argument ( promise that the transaction is unlikely to fail ):
>
> from django.db import distributed:
>
> with distributed('foo') as foo:
> MyModel.get_or_create(field=123)
>
> then, before calling the emulated behavior, the db would have to:
>
> * do all the operations (like it would normally do with regular commit - 
> thus checking every constrainst and so on)
> * then do a rollback (so that it doesn't store the actual values in the db)
> * then serialize them in separate journal (the additional model I 
> mentioned - an analogy to an actual separate journal of PostgreSQL)
>
> Utterly ugly / hacky solution if you ask me, but please keep in mind that 
> this would be only emulation of the actual algorithm for the databases 
> which don't support this standart
>
> As for the relations, I have thought a lot about it and the only 
> pseudocode I could think of was utterly ugly as well:
>
> with distributed('foo') as foo:
> foo.add(MyModel.objects.get_or_create, {'field': 123}, 
> namespace='mymodel')
> foo.add(MyOtherModel.objects.create, {'my_model_id': 
> ('from-namespace', 'mymodel')})
>
> Theoretically, both of these syntaxes could co-exist. If you wouldn't have 
> any relations, you could use the cleaner syntax.
>
> So I'd say it would technically be possible but would lead to very, very, 
> very ugly code (at least in the second scenario with relations). And I 
> realize that this is not an option in the Django world.
>
> I understand that because of all the above it is unlikely to create a nice 
> interface which would work in database-agnostic way, therefore Django would 
> have to throw IntegrityError if somebody would be trying to do distributed 
> transaction on non-supported database? But if that's the case then this 
> code doesn't really belong in the django core, does it? Which means that 
> I'm probably left with the monkey-patching thing :( Or .. ? I have to 
> prepare this functionality either way - because I need it ;)
>
> Thank you for all the answers !
>
> W dniu piątek, 2 grudnia 2016 14:32:51 UTC+1 użytkownik Patryk Zawadzki 
> napisał:
>>
>> W dniu piątek, 2 grudnia 2016 12:05:11 UTC+1 użytkownik Mateusz 
>> Mikołajczyk napisał:
>>>
>>> What would you say about checking which CRUD operations were executed 
>>> within atomic() call (in order to serialize them and save into a special 
>>> model for databases which don't support this functionality) ? Is it 
>>> realistic? 
>>>
>>
>> It would likely break the promise that distributed two-step transactions 
>> give you: that once all statements are prepared the transaction is unlikely 
>> to fail during commit. In this case the commit would mean "start over and 
>> try to repeat my steps" at which point any of the recorded statements is 
>> likely to fail constraint checks. (Even more so if your code used 
>> get_or_create().)
>>
>> Also how would relations work? You begin a transaction, create a Foo 
>> instance and the returned PK is 5. You assign it to child models. At this 
>> point the transaction is saved and rolled back. During replay the insert 
>> returns PK = 7, at this point there's no way to detect that some of the 
>> stored fives should now be treated as sevens while some should remain fives.
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com .
> To post to 

Re: two-phase commit / distributed transaction

2016-12-02 Thread Aymeric Augustin
Hello,

To be honest I’m pessimistic about the feasibility of emulating transactional 
behavior — pretty much the most complicated and low level thing databases do — 
in the application. I don’t think that would be considered suitable for Django.

Usually Django handles such cases with a database feature flag and make the 
methods no-ops on databases that don’t support the corresponding features. For 
instance that’s how Django ignores transactions on MySQL + MyISAM.

Best regards,

-- 
Aymeric.

> On 2 Dec 2016, at 20:43, Mateusz Mikołajczyk  
> wrote:
> 
> Well, I suppose that it would either lead to very obfuscated implementation 
> code, or very weird syntax (client code). As for your first argument ( 
> promise that the transaction is unlikely to fail ):
> 
> from django.db import distributed:
> 
> with distributed('foo') as foo:
> MyModel.get_or_create(field=123)
> 
> then, before calling the emulated behavior, the db would have to:
> 
> * do all the operations (like it would normally do with regular commit - thus 
> checking every constrainst and so on)
> * then do a rollback (so that it doesn't store the actual values in the db)
> * then serialize them in separate journal (the additional model I mentioned - 
> an analogy to an actual separate journal of PostgreSQL)
> 
> Utterly ugly / hacky solution if you ask me, but please keep in mind that 
> this would be only emulation of the actual algorithm for the databases which 
> don't support this standart
> 
> As for the relations, I have thought a lot about it and the only pseudocode I 
> could think of was utterly ugly as well:
> 
> with distributed('foo') as foo:
> foo.add(MyModel.objects.get_or_create, {'field': 123}, 
> namespace='mymodel')
> foo.add(MyOtherModel.objects.create, {'my_model_id': ('from-namespace', 
> 'mymodel')})
> 
> Theoretically, both of these syntaxes could co-exist. If you wouldn't have 
> any relations, you could use the cleaner syntax.
> 
> So I'd say it would technically be possible but would lead to very, very, 
> very ugly code (at least in the second scenario with relations). And I 
> realize that this is not an option in the Django world.
> 
> I understand that because of all the above it is unlikely to create a nice 
> interface which would work in database-agnostic way, therefore Django would 
> have to throw IntegrityError if somebody would be trying to do distributed 
> transaction on non-supported database? But if that's the case then this code 
> doesn't really belong in the django core, does it? Which means that I'm 
> probably left with the monkey-patching thing :( Or .. ? I have to prepare 
> this functionality either way - because I need it ;)
> 
> Thank you for all the answers !
> 
> W dniu piątek, 2 grudnia 2016 14:32:51 UTC+1 użytkownik Patryk Zawadzki 
> napisał:
> W dniu piątek, 2 grudnia 2016 12:05:11 UTC+1 użytkownik Mateusz Mikołajczyk 
> napisał:
> What would you say about checking which CRUD operations were executed within 
> atomic() call (in order to serialize them and save into a special model for 
> databases which don't support this functionality) ? Is it realistic? 
> 
> It would likely break the promise that distributed two-step transactions give 
> you: that once all statements are prepared the transaction is unlikely to 
> fail during commit. In this case the commit would mean "start over and try to 
> repeat my steps" at which point any of the recorded statements is likely to 
> fail constraint checks. (Even more so if your code used get_or_create().)
> 
> Also how would relations work? You begin a transaction, create a Foo instance 
> and the returned PK is 5. You assign it to child models. At this point the 
> transaction is saved and rolled back. During replay the insert returns PK = 
> 7, at this point there's no way to detect that some of the stored fives 
> should now be treated as sevens while some should remain fives.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com 
> .
> To post to this group, send email to django-developers@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/django-developers 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/8d4ca04f-cdcd-4d8b-a060-fc5598b3baf8%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you 

Re: two-phase commit / distributed transaction

2016-12-02 Thread Mateusz Mikołajczyk
Well, I suppose that it would either lead to very obfuscated implementation 
code, or very weird syntax (client code). As for your first argument ( 
promise that the transaction is unlikely to fail ):

from django.db import distributed:

with distributed('foo') as foo:
MyModel.get_or_create(field=123)

then, before calling the emulated behavior, the db would have to:

* do all the operations (like it would normally do with regular commit - 
thus checking every constrainst and so on)
* then do a rollback (so that it doesn't store the actual values in the db)
* then serialize them in separate journal (the additional model I mentioned 
- an analogy to an actual separate journal of PostgreSQL)

Utterly ugly / hacky solution if you ask me, but please keep in mind that 
this would be only emulation of the actual algorithm for the databases 
which don't support this standart

As for the relations, I have thought a lot about it and the only pseudocode 
I could think of was utterly ugly as well:

with distributed('foo') as foo:
foo.add(MyModel.objects.get_or_create, {'field': 123}, 
namespace='mymodel')
foo.add(MyOtherModel.objects.create, {'my_model_id': ('from-namespace', 
'mymodel')})

Theoretically, both of these syntaxes could co-exist. If you wouldn't have 
any relations, you could use the cleaner syntax.

So I'd say it would technically be possible but would lead to very, very, 
very ugly code (at least in the second scenario with relations). And I 
realize that this is not an option in the Django world.

I understand that because of all the above it is unlikely to create a nice 
interface which would work in database-agnostic way, therefore Django would 
have to throw IntegrityError if somebody would be trying to do distributed 
transaction on non-supported database? But if that's the case then this 
code doesn't really belong in the django core, does it? Which means that 
I'm probably left with the monkey-patching thing :( Or .. ? I have to 
prepare this functionality either way - because I need it ;)

Thank you for all the answers !

W dniu piątek, 2 grudnia 2016 14:32:51 UTC+1 użytkownik Patryk Zawadzki 
napisał:
>
> W dniu piątek, 2 grudnia 2016 12:05:11 UTC+1 użytkownik Mateusz 
> Mikołajczyk napisał:
>>
>> What would you say about checking which CRUD operations were executed 
>> within atomic() call (in order to serialize them and save into a special 
>> model for databases which don't support this functionality) ? Is it 
>> realistic? 
>>
>
> It would likely break the promise that distributed two-step transactions 
> give you: that once all statements are prepared the transaction is unlikely 
> to fail during commit. In this case the commit would mean "start over and 
> try to repeat my steps" at which point any of the recorded statements is 
> likely to fail constraint checks. (Even more so if your code used 
> get_or_create().)
>
> Also how would relations work? You begin a transaction, create a Foo 
> instance and the returned PK is 5. You assign it to child models. At this 
> point the transaction is saved and rolled back. During replay the insert 
> returns PK = 7, at this point there's no way to detect that some of the 
> stored fives should now be treated as sevens while some should remain fives.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8d4ca04f-cdcd-4d8b-a060-fc5598b3baf8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: two-phase commit / distributed transaction

2016-12-02 Thread Patryk Zawadzki
W dniu piątek, 2 grudnia 2016 12:05:11 UTC+1 użytkownik Mateusz Mikołajczyk 
napisał:
>
> What would you say about checking which CRUD operations were executed 
> within atomic() call (in order to serialize them and save into a special 
> model for databases which don't support this functionality) ? Is it 
> realistic? 
>

It would likely break the promise that distributed two-step transactions 
give you: that once all statements are prepared the transaction is unlikely 
to fail during commit. In this case the commit would mean "start over and 
try to repeat my steps" at which point any of the recorded statements is 
likely to fail constraint checks. (Even more so if your code used 
get_or_create().)

Also how would relations work? You begin a transaction, create a Foo 
instance and the returned PK is 5. You assign it to child models. At this 
point the transaction is saved and rolled back. During replay the insert 
returns PK = 7, at this point there's no way to detect that some of the 
stored fives should now be treated as sevens while some should remain fives.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/098bbb3a-1ba0-42e5-8b75-441474981855%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: two-phase commit / distributed transaction

2016-12-02 Thread Mateusz Mikołajczyk
What would you say about checking which CRUD operations were executed 
within atomic() call (in order to serialize them and save into a special 
model for databases which don't support this functionality) ? Is it 
realistic? 

What I mean by that is that when you do:

from django.db import transaction

with transaction.atomic():
MyModel.objects.create(field=123)

then the generated SQL is something like

BEGIN;
INSERT INTO mymodel values (123);
COMMIT;

However, if the database doesn't support the TPC functionality, the SQL 
would have to be slightly different, say:

BEGIN;
INSERT INTO prepared_transactions (txn_id, model, operation, params) values 
('foo', 'MyModel', 'create', '{field:123}');
COMMIT;

But on the other hand, if the database does support that, it could be 
'normal', i.e.:

BEGIN;
INSERT INTO mymodel values ( ... )
BEGIN TRANSACTION 'foo';
(no COMMIT)

If it is not possible to trace the CRUD operations, would it be easier to 
introduce a slightly different syntax, say ...

from django.db import prepare_distributed:

with prepare_distributed('foo') as prepare:
prepare.add_operation(MyModel.objects.create, {'field': 123})

After all, it's not like the developer doesn't know whether he's doing a 
distributed transaction or not.. 
 
As for making the atomic() more complex, I don't think that it would be 
significantly hard. The distributed transaction isn't really *that* 
different - it's just calling PREPARE TRANSACTION 'foo' (without calling 
COMMIT). I thought that the Atomic class could simply have some kind of 
inner method hooks. The default class could then implement those:

class Atomic(ContextDecorator):
def _commit_wrapper(self, connection):
return connection.commit()

but the Two phase could do it differently:

class TwoPhaseAtomic(Atomic):
def _commit_wrapper(self, connection):
return 
connection.prepare_distributed(self.distributed_transaction_id);

Of course, the prepare_distributed call would create models in the special 
table if the database wouldn't support the functionality and call regular 
commit() at the end, or call appropriate command otherwise - so this seems 
like the easiest thing to do. The problem that I haven't figured out yet 
would be to trace the instances being saved / created / etc ..

W dniu czwartek, 1 grudnia 2016 14:54:27 UTC+1 użytkownik Florian Apolloner 
napisał:
>
>
>
> On Thursday, December 1, 2016 at 2:04:53 PM UTC+1, Aymeric Augustin wrote:
>>
>> The person who will write and test that code has all my sympathy :-) 
>>
>
> I'll second that, I have no idea how Aymeric managed to keep his sanity 
> while rewriting the transaction code :D 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/0a95bc57-d1d2-491f-9ce6-8d7956be2954%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: two-phase commit / distributed transaction

2016-12-01 Thread Florian Apolloner


On Thursday, December 1, 2016 at 2:04:53 PM UTC+1, Aymeric Augustin wrote:
>
> The person who will write and test that code has all my sympathy :-) 
>

I'll second that, I have no idea how Aymeric managed to keep his sanity 
while rewriting the transaction code :D 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/fc544bff-6fe0-4a0a-888b-30c862fac10b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: two-phase commit / distributed transaction

2016-12-01 Thread Aymeric Augustin
> On 1 Dec 2016, at 13:29, Shai Berger  wrote:
> 
> On Thursday 01 December 2016 13:52:41 Aymeric Augustin wrote:
>> 
>> I’m proposing a separate context manager because I’m worried about
>> increasing again the complexity of transaction.atomic. There will be a
>> significant amount of duplication between the two implementations, though.
>> 
> I believe that making transaction.atomic more complex for this will be 
> inevitable, because the two will need to interact: If I'm in a TPC 
> transaction, and open an atomic block, it needs to be handled as part of the 
> TPC transaction. There are too many atomic blocks in the Django ecosystem, 
> including Django itself, to make the feature useful any other way.

You may be right.

The person who will write and test that code has all my sympathy :-)

-- 
Aymeric.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/71DB73AC-E6C5-4753-8F6D-A04A2A3F57BD%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.


Re: two-phase commit / distributed transaction

2016-12-01 Thread Shai Berger
On Thursday 01 December 2016 13:52:41 Aymeric Augustin wrote:
> 
> I’m proposing a separate context manager because I’m worried about
> increasing again the complexity of transaction.atomic. There will be a
> significant amount of duplication between the two implementations, though.
> 
I believe that making transaction.atomic more complex for this will be 
inevitable, because the two will need to interact: If I'm in a TPC 
transaction, and open an atomic block, it needs to be handled as part of the 
TPC transaction. There are too many atomic blocks in the Django ecosystem, 
including Django itself, to make the feature useful any other way.


Re: two-phase commit / distributed transaction

2016-12-01 Thread Aymeric Augustin
Hello,

Currently you cannot do this:

from django.db import connection
connection.xid()  # or any other TPC method

Adding implementations of TPC methods in BaseDatabaseWrapper() that simply 
forward to the underlying connection object is the first step for a database 
agnostic implementation in Django.


For the high level API, here’s one possibility:

# entering the block calls .xid(format_id , global_transaction_id , 
branch_qualifier) and .tpc_begin(xid)
with transaction.atomic2(format_id , global_transaction_id , branch_qualifier) 
as prepare: 

# run statements

prepare()

# check if others are ready to commit
# raise an exception to abort

# exiting the block calls .tpc_commit() or .tpc_rollback() depending on whether 
there’s an exception

I’m proposing a separate context manager because I’m worried about increasing 
again the complexity of transaction.atomic. There will be a significant amount 
of duplication between the two implementations, though.


The proposal above doesn’t account for recovery: .tpc_recover(), 
.tpc_commit(xid), .tpc_rollback(xid). I’m not sure what recovery of a two phase 
transaction is and I can’t say if it needs to be supported in the API.


Also I didn’t talk about savepoints. I assume they can be supported like in 
regular transactions.


I hope this helps,

-- 
Aymeric.

> On 1 Dec 2016, at 12:27, Mateusz Mikołajczyk  
> wrote:
> 
> Hello, fellow devs.
> 
> I have been googling intensively in order to see whether somebody already 
> raised such issue, but so far I have been unsuccesful. Therefore, trembling 
> on my legs, I decided to write to the devlist as suggested in the docs.
> 
> I am trying to extend the atomic decorator / context statement in order to do 
> 'prepare transaction \'foo\'' rather than usual 'commit' on succesful 
> transaction. It is, however, not the usual scenario where django would talk 
> to multiple databases. What I have in mind is a bunch of microservices, one 
> of them which would be django application. Therefore django app would be 
> talking to the external transaction manager which would then take care of 
> executing the appropriate transactions inside of each microservice.
> I suppose that after a bunch of hacking I could implement this with some 
> monkey patching of the original atomic() code but it clearly is not the way 
> to go.
> 
> I then started to think how this could be done database-agnostic way. I know 
> that PostgreSQL supports this with 'prepare transaction' statement, but I 
> suppose that other databases have different syntax for this kind of behaviour 
> and some don't support this feature at all. Therefore I thought that in e.g. 
> SQLite3 (or other database which doesn't support this natively), this 
> behavior could be 'emulated'. Therefore I thought of the following pseudocode:
> 
> ```
> with transaction.atomic(commit_hook=lambda connection: 
> connection.prepare_transaction('foo'))
> 
> OR
> 
> with transaction.atomic(prepare_transaction='foo')
> ```
> 
> When you would do CRUD operations they would be instead serialized inside a 
> special table, and then, after issuing another command, say ..
> 
> from django.db import transaction
> transaction.commit_prepared('foo')
> 
> they would be applied using regular 'atomic' call. (or.. I don't know, Django 
> could raise an IntegrityError if the database doesn't support distributed 
> transactions and the code would try to execute them)
> 
> Do you think that this is realistic or is it a wrong approach to the subject?
> 
> kind regards,
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com 
> .
> To post to this group, send email to django-developers@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/django-developers 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/8d358e52-591d-4b9b-8c11-882e6a2ac80d%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at