Re: two-phase commit / distributed transaction

2016-12-02 Thread Mateusz Mikołajczyk
If anybody is interested, I created a proof of concept code for PostgreSQL 
which extends existing Atomic context:

https://github.com/toudi/django-tpc-proof-of-concept

the actual implementation can be found inside tpc/atomic_tpc.py and there 
are two commands available:

./manage.py prepare
./manage.py commit

if atomic() would have been extendible (i.e. if one could replace 
connection.commit() with something else ) then this whole lot of 
monkeypatching wouldn't be necesary.

What do you guys think? Also, has nobody ever written stuff like this? Or 
am searching the wrong way?

cheers,
toudi

W dniu piątek, 2 grudnia 2016 22:02:44 UTC+1 użytkownik Aymeric Augustin 
napisał:
>
> Hello,
>
> To be honest I’m pessimistic about the feasibility of emulating 
> transactional behavior — pretty much the most complicated and low level 
> thing databases do — in the application. I don’t think that would be 
> considered suitable for Django.
>
> Usually Django handles such cases with a database feature flag and make 
> the methods no-ops on databases that don’t support the corresponding 
> features. For instance that’s how Django ignores transactions on MySQL + 
> MyISAM.
>
> Best regards,
>
> -- 
> Aymeric.
>
> On 2 Dec 2016, at 20:43, Mateusz Mikołajczyk <mikolajcz...@gmail.com 
> > wrote:
>
> Well, I suppose that it would either lead to very obfuscated 
> implementation code, or very weird syntax (client code). As for your first 
> argument ( promise that the transaction is unlikely to fail ):
>
> from django.db import distributed:
>
> with distributed('foo') as foo:
> MyModel.get_or_create(field=123)
>
> then, before calling the emulated behavior, the db would have to:
>
> * do all the operations (like it would normally do with regular commit - 
> thus checking every constrainst and so on)
> * then do a rollback (so that it doesn't store the actual values in the db)
> * then serialize them in separate journal (the additional model I 
> mentioned - an analogy to an actual separate journal of PostgreSQL)
>
> Utterly ugly / hacky solution if you ask me, but please keep in mind that 
> this would be only emulation of the actual algorithm for the databases 
> which don't support this standart
>
> As for the relations, I have thought a lot about it and the only 
> pseudocode I could think of was utterly ugly as well:
>
> with distributed('foo') as foo:
> foo.add(MyModel.objects.get_or_create, {'field': 123}, 
> namespace='mymodel')
> foo.add(MyOtherModel.objects.create, {'my_model_id': 
> ('from-namespace', 'mymodel')})
>
> Theoretically, both of these syntaxes could co-exist. If you wouldn't have 
> any relations, you could use the cleaner syntax.
>
> So I'd say it would technically be possible but would lead to very, very, 
> very ugly code (at least in the second scenario with relations). And I 
> realize that this is not an option in the Django world.
>
> I understand that because of all the above it is unlikely to create a nice 
> interface which would work in database-agnostic way, therefore Django would 
> have to throw IntegrityError if somebody would be trying to do distributed 
> transaction on non-supported database? But if that's the case then this 
> code doesn't really belong in the django core, does it? Which means that 
> I'm probably left with the monkey-patching thing :( Or .. ? I have to 
> prepare this functionality either way - because I need it ;)
>
> Thank you for all the answers !
>
> W dniu piątek, 2 grudnia 2016 14:32:51 UTC+1 użytkownik Patryk Zawadzki 
> napisał:
>>
>> W dniu piątek, 2 grudnia 2016 12:05:11 UTC+1 użytkownik Mateusz 
>> Mikołajczyk napisał:
>>>
>>> What would you say about checking which CRUD operations were executed 
>>> within atomic() call (in order to serialize them and save into a special 
>>> model for databases which don't support this functionality) ? Is it 
>>> realistic? 
>>>
>>
>> It would likely break the promise that distributed two-step transactions 
>> give you: that once all statements are prepared the transaction is unlikely 
>> to fail during commit. In this case the commit would mean "start over and 
>> try to repeat my steps" at which point any of the recorded statements is 
>> likely to fail constraint checks. (Even more so if your code used 
>> get_or_create().)
>>
>> Also how would relations work? You begin a transaction, create a Foo 
>> instance and the returned PK is 5. You assign it to child models. At this 
>> point the transaction is saved and rolled back. During replay the insert 
>> returns PK = 7, at this point there's no way to detect that some of the 
>&g

Re: two-phase commit / distributed transaction

2016-12-02 Thread Mateusz Mikołajczyk
Well, I suppose that it would either lead to very obfuscated implementation 
code, or very weird syntax (client code). As for your first argument ( 
promise that the transaction is unlikely to fail ):

from django.db import distributed:

with distributed('foo') as foo:
MyModel.get_or_create(field=123)

then, before calling the emulated behavior, the db would have to:

* do all the operations (like it would normally do with regular commit - 
thus checking every constrainst and so on)
* then do a rollback (so that it doesn't store the actual values in the db)
* then serialize them in separate journal (the additional model I mentioned 
- an analogy to an actual separate journal of PostgreSQL)

Utterly ugly / hacky solution if you ask me, but please keep in mind that 
this would be only emulation of the actual algorithm for the databases 
which don't support this standart

As for the relations, I have thought a lot about it and the only pseudocode 
I could think of was utterly ugly as well:

with distributed('foo') as foo:
foo.add(MyModel.objects.get_or_create, {'field': 123}, 
namespace='mymodel')
foo.add(MyOtherModel.objects.create, {'my_model_id': ('from-namespace', 
'mymodel')})

Theoretically, both of these syntaxes could co-exist. If you wouldn't have 
any relations, you could use the cleaner syntax.

So I'd say it would technically be possible but would lead to very, very, 
very ugly code (at least in the second scenario with relations). And I 
realize that this is not an option in the Django world.

I understand that because of all the above it is unlikely to create a nice 
interface which would work in database-agnostic way, therefore Django would 
have to throw IntegrityError if somebody would be trying to do distributed 
transaction on non-supported database? But if that's the case then this 
code doesn't really belong in the django core, does it? Which means that 
I'm probably left with the monkey-patching thing :( Or .. ? I have to 
prepare this functionality either way - because I need it ;)

Thank you for all the answers !

W dniu piątek, 2 grudnia 2016 14:32:51 UTC+1 użytkownik Patryk Zawadzki 
napisał:
>
> W dniu piątek, 2 grudnia 2016 12:05:11 UTC+1 użytkownik Mateusz 
> Mikołajczyk napisał:
>>
>> What would you say about checking which CRUD operations were executed 
>> within atomic() call (in order to serialize them and save into a special 
>> model for databases which don't support this functionality) ? Is it 
>> realistic? 
>>
>
> It would likely break the promise that distributed two-step transactions 
> give you: that once all statements are prepared the transaction is unlikely 
> to fail during commit. In this case the commit would mean "start over and 
> try to repeat my steps" at which point any of the recorded statements is 
> likely to fail constraint checks. (Even more so if your code used 
> get_or_create().)
>
> Also how would relations work? You begin a transaction, create a Foo 
> instance and the returned PK is 5. You assign it to child models. At this 
> point the transaction is saved and rolled back. During replay the insert 
> returns PK = 7, at this point there's no way to detect that some of the 
> stored fives should now be treated as sevens while some should remain fives.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8d4ca04f-cdcd-4d8b-a060-fc5598b3baf8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: two-phase commit / distributed transaction

2016-12-02 Thread Mateusz Mikołajczyk
What would you say about checking which CRUD operations were executed 
within atomic() call (in order to serialize them and save into a special 
model for databases which don't support this functionality) ? Is it 
realistic? 

What I mean by that is that when you do:

from django.db import transaction

with transaction.atomic():
MyModel.objects.create(field=123)

then the generated SQL is something like

BEGIN;
INSERT INTO mymodel values (123);
COMMIT;

However, if the database doesn't support the TPC functionality, the SQL 
would have to be slightly different, say:

BEGIN;
INSERT INTO prepared_transactions (txn_id, model, operation, params) values 
('foo', 'MyModel', 'create', '{field:123}');
COMMIT;

But on the other hand, if the database does support that, it could be 
'normal', i.e.:

BEGIN;
INSERT INTO mymodel values ( ... )
BEGIN TRANSACTION 'foo';
(no COMMIT)

If it is not possible to trace the CRUD operations, would it be easier to 
introduce a slightly different syntax, say ...

from django.db import prepare_distributed:

with prepare_distributed('foo') as prepare:
prepare.add_operation(MyModel.objects.create, {'field': 123})

After all, it's not like the developer doesn't know whether he's doing a 
distributed transaction or not.. 
 
As for making the atomic() more complex, I don't think that it would be 
significantly hard. The distributed transaction isn't really *that* 
different - it's just calling PREPARE TRANSACTION 'foo' (without calling 
COMMIT). I thought that the Atomic class could simply have some kind of 
inner method hooks. The default class could then implement those:

class Atomic(ContextDecorator):
def _commit_wrapper(self, connection):
return connection.commit()

but the Two phase could do it differently:

class TwoPhaseAtomic(Atomic):
def _commit_wrapper(self, connection):
return 
connection.prepare_distributed(self.distributed_transaction_id);

Of course, the prepare_distributed call would create models in the special 
table if the database wouldn't support the functionality and call regular 
commit() at the end, or call appropriate command otherwise - so this seems 
like the easiest thing to do. The problem that I haven't figured out yet 
would be to trace the instances being saved / created / etc ..

W dniu czwartek, 1 grudnia 2016 14:54:27 UTC+1 użytkownik Florian Apolloner 
napisał:
>
>
>
> On Thursday, December 1, 2016 at 2:04:53 PM UTC+1, Aymeric Augustin wrote:
>>
>> The person who will write and test that code has all my sympathy :-) 
>>
>
> I'll second that, I have no idea how Aymeric managed to keep his sanity 
> while rewriting the transaction code :D 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/0a95bc57-d1d2-491f-9ce6-8d7956be2954%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


two-phase commit / distributed transaction

2016-12-01 Thread Mateusz Mikołajczyk
Hello, fellow devs.

I have been googling intensively in order to see whether somebody already 
raised such issue, but so far I have been unsuccesful. Therefore, trembling 
on my legs, I decided to write to the devlist as suggested in the docs.

I am trying to extend the atomic decorator / context statement in order to 
do 'prepare transaction \'foo\'' rather than usual 'commit' on succesful 
transaction. It is, however, not the usual scenario where django would talk 
to multiple databases. What I have in mind is a bunch of microservices, one 
of them which would be django application. Therefore django app would be 
talking to the external transaction manager which would then take care of 
executing the appropriate transactions inside of each microservice.
I suppose that after a bunch of hacking I could implement this with some 
monkey patching of the original atomic() code but it clearly is not the way 
to go.

I then started to think how this could be done database-agnostic way. I 
know that PostgreSQL supports this with 'prepare transaction' statement, 
but I suppose that other databases have different syntax for this kind of 
behaviour and some don't support this feature at all. Therefore I thought 
that in e.g. SQLite3 (or other database which doesn't support this 
natively), this behavior could be 'emulated'. Therefore I thought of the 
following pseudocode:

```
with transaction.atomic(commit_hook=lambda connection: 
connection.prepare_transaction('foo'))

OR

with transaction.atomic(prepare_transaction='foo')
```

When you would do CRUD operations they would be instead serialized inside a 
special table, and then, after issuing another command, say ..

from django.db import transaction
transaction.commit_prepared('foo')

they would be applied using regular 'atomic' call. (or.. I don't know, 
Django could raise an IntegrityError if the database doesn't support 
distributed transactions and the code would try to execute them)

Do you think that this is realistic or is it a wrong approach to the 
subject?

kind regards,

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/8d358e52-591d-4b9b-8c11-882e6a2ac80d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.