Re: [Sqlalchemy-users] Transactions and Sessions, proposal #2

Daniel Miller Thu, 06 Apr 2006 20:10:04 -0700


Michael Bayer wrote:

The SQLEngine architecture, as well as the objectstore and everythingelse, heavily relies upon thread-local storage to share resources. Sofar, there has not really been much concept of a "connection" outside ofthe pooling module since it was all built in such a way theres usuallynowhere to pass a "connection" around, its just implicitly availablewhen needed (and implicitly returned to the pool when not). But it isnice to say, "conn = engine.connect()", with a nice verb and a tangiblething to hold onto. So this proposal, while not ditching the basic"implicit" behavior of SQL statement execution as well as Sessionconstruction that I think is pretty popular, introduces some more"explicitness" when its desired to use concurrent SQL connections ortransactions, or concurrent Sessions, or a combination of both. It alsosimplifies the Session interface and removes all transaction-liketerminology from the ORM package. there is actually very little newhere, its mostly a rearrangement of terminology.
Part I.  SQL construction, transactions, engines. note that we
arent dealing with the SA ORM at all, no mappers, objectstores,
they are not not involved here.
==============
# 1. engine. nothing new here. it represents a particular connectionURL using
# the dialect (to use a Hibernate term) of a particular vendor's database.

    e = create_engine('foo')

    # SQL statements constructed against the engine use it implicitly to
# execute themselves, connections are pulled from the pool behindthe scenes and released.
    mytable.update().execute()

# 2. transaction, the implicit way, slightly new but not much.

    trans = e.begin()
    try:
        mytable.update().execute()
        trans.commit()
    except:
        trans.rollback()
the "trans" object is called a SQLTransaction, which doesnt do muchexcept provide the "commit" and "rollback" function, but there you go,the "tangible transaction object" to hold onto.


Nice. Simple and explicit (it eliminates the previous ambiguity of commit 
without begin).

# 3. the connection. this is "new" (its currenly called aSQLSession...but that name changes to ConnectionProxy) generally used tosay "do this with this explicit resource". this is to make everyonesneeds for what I was calling "nested" stuff possible, but youre rightits not really "nested", so now theres a tangible thing to hold onto(well there has been for several weeks now, but now its the primary wayto do it).
    # this is a ConnectionProxy object.  it represents a single unique 
connection from the
    # pool, which it grabs when needed, or when a transaction is started it 
holds onto it.
    # this object is used automatically with the implicit stuff in #1, #2
    conn = e.connection()


Is that supposed to be e.connect()? You had mentioned "a nice verb".

The ConnectionProxy should probably have a close() method that would explicitly release 
the connection back to the pool. This is the type of thing that keeps programs tight and 
squeaky (i.e. deallocating unused resources when they are no longer needed). Right now 
CPython immediately cleans up objects when they go out of scope, but the python-devs are 
always very adamant about not depending on this since it could change if a new garbage 
collection scheme were adopted. Not to mention, objects often don't go out of scope until 
long after their needed anyway. This type of "implicit" resource management is 
how you get memory leaks in a dynamic language.

# 3a. transactions on the connection. same deal, same SQLTransactionobject:
    trans = conn.begin()
    trans.commit()
# 3b. but heres the new thing. since we have this nice tangibleConnectionProxy, lets tangibly use it.# We stick "using()" on the base ClauseElement (which is the base of allTable/Select/Alias/etc objects),
# so you can use different connections all over the place.

    mytable.using(conn).select().execute()
    mytable.select().using(conn).execute()


What's the difference between these two? I think it should only allow the 
first. But then again, I'd rather just have this:

Query(mytable, connection=conn).select()

and don't bother with using() on the table.


# or maybe we do the keyword thing
    mytable.select(connection=conn).execute()


Nah, aren't select params usually clause elements?

# 3c. or if you want the "implicit" stuff to use this ConnectionProxy,to bypass current transactional state, introduce new tranasctionalstate, or even connect to some other database, and have it affecteverything going forward (on the current thread), do this:
    engine.set_current(connection)  # now everyone uses 'connection'
    engine.restore_previous()   # put the old one back.


I see what your doing with this, but does it have to be part of the engine? I 
would say the engine's job is to create new connections and to compile queries 
into a particular dialect of SQL. It looks like the engine also manages 
connections and transactions. The connection/transaction manager should reside 
in something like the SQLSession, whose job is to track connections and 
transactions. Could set_current/restore_previous reside there instead? Try to 
keep the engine doing just a few things well rather than everything for 
everyone.


# 3d. 'using' can take engines or connections:

    ...

    t.select().using(e2).execute()

    t.using(e1.connection()).select().execute()


What does this mean? Will using() pull a connection out of the engine on each 
execute() or will it execute every query on the same connection? This API is 
mushy: it's starting to seem like you can pass any object into any function and 
you just might get something back. However, all this versatility comes at a 
cost: there are so many different combinations of functions and parameters that 
work together and I can't remember which one does what. Also, debugging becomes 
harder because there are more ways to pass in something that works but returns 
the wrong result or executes it in the wrong context. Since the using() method 
already takes a connection and it's easy enough to get the connection from the 
engine, I vote to just support that version. Keep it simple.

# 3e. switch the current engine to use a different engine's connection.this is a hacky thing. There is a user who is writing his own"clustered database" solution involving data stored in redundant tablesacross multiple DBs, he needed something exactly like this:
    e1 = create_engine('postgres://foo')
    e2 = create_engine('postgres://bar')
e1.set_current(e2.connection()) # now everyone that uses e1,secretly uses e2
    e1.restore_previous()  # put it back
# in theory, things should be designed so that the "dialect" of e2 getsused also if you swap it into e1. might need a little more internalrearranging for that.


Eeek! I don't like that at all...which engine/connection/transaction/whatever 
to I have here? If someone really needs this then they can implement it as a 
wrapper around engine (I'm assuming that would be fairly trivial or you 
wouldn't have included it here).


# 4. natural nesting behavior

As always, begin/commit/rollback uses the "outermost nested" idea, where"nested" calls to begin/commit dont affect the single "outermost"transaction.


e = create_engine('foo')
trans = e.begin()
try:
    mytable.update().execute()

    # then in some other function..
    trans2 = e.begin()
    try:
        myothertable.update.execute()
        trans2.commit()   # doesnt commit
    except:
        trans2.rollback()   # rolls trans2, trans

    trans.commit()   # commits mytable, myothertable
except:
    trans.rollback()  # rolls back trans


This is good. All "begin()" methods need to return a transaction object that 
can either be committed or rolled back. This removes a critical ambiguity.

So nesting just means that kind of nesting. The other thing i wascalling nesting, yes, its just two concurrent connections, its notreally "nesting", for that you just use a second ConnectionProxy, andyou can optionally use set_current/restore_previous to make it take overthe "implicit" scope.


I'm not a fan of this implicit scope that used all over SA. It's too much magic. It makes it hard 
to debug because things will seem to work even if they're not happening in the correct scope. 
There's no easy way to tell that you've wandered out of the correct scope because it just defaults 
to some "global" scope and goes merrily on it's way. I'd rather have an exception if I'm 
not executing in the correct scope: fail early. Allow optional use of something like the 
AutoConnectEngine when we want to "make a new one if one does not already exist" (even 
have it be the default if you want, but allow it to be substituted with something more strict).

So to sum up Part I, heres the full vocabulary to learn regardingconnections, engines, transactions: SQLEngine, ConnectionProxy,SQLTransaction. begin(), connection(), using(), commit(), rollback(),set_current(), restore_previous().
====================================
Part II. Objectstore stuff. The ORM is a system for emitting SQL. ituses the underlying SQL construction to do its work. There is theobjectstore module, the Mapper object and friends, and the Sessionobject, and thats mostly it. The Session object gets simplified,begin()/commit() go away (theyll hang around in deprecated state), andwith regards to scoping, you get the basic terminology flush(),Session(), mapper.using(), set_current() and restore_previous(), andthats about it. so all the transactional terminology is removed fromobjectstore/session since Session doesnt have much to do withtranasctions, except that it uses one for its own flush() operation. ifyou want your own SQL transaction demarcation, you use the engine stuffas normal, so only one set of concepts to learn. The Session is no morethan a "bag of objects" which can "flush" SQL statements to the database.
# 5.  simple objectstore work

obj = mapper.get(1)
obj.foo = 'bar'
objectstore.flush()  # flush everything

obj2 = mapper.get(2)
obj2.foo = 'bar'
objectstore.flush(obj, obj2)  # flush just two objects
# 6. multiple objectstore sessions. these are just two separate "bagsof objects". they arent doing anything explicit with the underlyingSQLEngines so are therefore using whatever ConnectionProxy's are currentfor them.
obj1 = mapper.get(1)
obj1.foo = 'bar'

sess = Session()  # second session
obj2 = mapper.using(sess).get(2)
obj2.foo = 'bar'
sess.flush()  # saves obj2

objectstore.flush()  # saves obj1
# 7. multiple objectstore sessions, sharing the same identity mapspace. this is similar to when we were using "begin()/commit()" withthe Session.
obj1 = mapper.get(1)
obj1.foo = 'bar'
sess = Session(new_imap=False) # shares the identity map from thecurrent thread's session
obj2 = mapper.using(sess).get(2)
obj2.foo = 'bar'
sess.flush()  # saves obj2

objectstore.flush()  # saves obj1
# 8. setting a new Session to be the current thread's session, so thatall subsequent calls use it automatically.
obj1 = mapper.get(1)
obj1.foo = 'bar'
sess = Session()  # new session
objectstore.set_current(sess)  # set it as current
try:
    obj2 = mapper.get(2)
    obj2.foo = 'bar'
    objectstore.flush() # saves obj2
finally:
    objectstore.restore_previous()
objectstore.flush()  # saves obj1


I'm fine with this as long as the thread-local pattern can be completely 
disabled.

# so really, what before was session.begin()/commit(), is now more orless likeobjectstore.set_current(Session(new_imap=False))/objectstore.flush()/objectstore.restore_previous().I guess the use case for this was so that you can write a function thatworks with some objects and then saves them, without affecting theobjects that were manipulated outside of the function...but probablyeasier to just use the mapper.using() syntax. if people want *exactly*what was begin/commit before, within a simple pair of functions asopposed to this more verbose style, we can do that too...but they are*not* going to be called begin/commit. too confusing !
============================

transactions + objectstore stuff
# 9. the Session is just an object that knows how to spit out a bunch ofUPDATE etc. statements. Transactions are used via the engine as fromPart I:
e = create_engine('foo')
trans = e.begin()
try:
    obj1 = mapper.get(1)
    obj1.foo = 'bar'
    objectstore.flush()
    trans.commit()
except:
    trans.rollback()
# 10. A shortcut for when you want a particular ConnectionProxy (or setof ConnectionProxies) to go into effect corresponding to a particularSession. This ConnectionProxy gets used for all operations with thisSession which involve the given ConnectionProxy's underlying engine:
e = create_engine('...')
conn = e.connection()
sess = Session(connections=[conn])


Why multiple connections? How will it decide which connection to use? Or are 
you thinking of multicasting the SQL to all connections? :)

obj = mapper.using(sess).get(3) # all SQL for conn's engine will usethe conn above for this operation
sess.flush()  # commit using conn
objectstore.set_current(sess) # now all ORM operations on both Mapperand Session use conn implicitly
objectstore.restore_previous()


I hope I haven't been too harsh. I'm trying to give honest feedback about what 
I think will be useful. Personally, I'd rather be force to change some things 
in my code that uses SA rather than be forced to live with warts forever for 
the sake of backward compatibility.

One recurring theme we seem to be butting heads on is that you like to have 
extra scoping code all over the place. While this may add a lot of flexibility, 
it also adds a lot of complexity which is not needed in most cases. Things like 
that are usually easy to add as an wrapper or extension when needed. This 
approach also incurs no extra cost when they are not needed.

In summary, I like the new transaction interface, and changing commit() to 
flush() on the Session is a must. However, I think the engine (schema?), table, 
and mapper should stick to what they do best: provide a high-level wrapper 
around the Python DBAPI. They should have very little or nothing to do with 
managing connection, transaction, or session scope, which is the job of 
frameworks built on top of them.

~ Daniel


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Sqlalchemy-users mailing list
Sqlalchemy-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlalchemy-users

Re: [Sqlalchemy-users] Transactions and Sessions, proposal #2

Reply via email to