Michael Bayer wrote:
The SQLEngine architecture, as well as the objectstore and everything
else, heavily relies upon thread-local storage to share resources. So
far, there has not really been much concept of a "connection" outside of
the pooling module since it was all built in such a way theres usually
nowhere to pass a "connection" around, its just implicitly available
when needed (and implicitly returned to the pool when not). But it is
nice to say, "conn = engine.connect()", with a nice verb and a tangible
thing to hold onto. So this proposal, while not ditching the basic
"implicit" behavior of SQL statement execution as well as Session
construction that I think is pretty popular, introduces some more
"explicitness" when its desired to use concurrent SQL connections or
transactions, or concurrent Sessions, or a combination of both. It also
simplifies the Session interface and removes all transaction-like
terminology from the ORM package. there is actually very little new
here, its mostly a rearrangement of terminology.
Part I. SQL construction, transactions, engines. note that we
arent dealing with the SA ORM at all, no mappers, objectstores,
they are not not involved here.
==============
# 1. engine. nothing new here. it represents a particular connection
URL using
# the dialect (to use a Hibernate term) of a particular vendor's database.
e = create_engine('foo')
# SQL statements constructed against the engine use it implicitly to
# execute themselves, connections are pulled from the pool behind
the scenes and released.
mytable.update().execute()
# 2. transaction, the implicit way, slightly new but not much.
trans = e.begin()
try:
mytable.update().execute()
trans.commit()
except:
trans.rollback()
the "trans" object is called a SQLTransaction, which doesnt do much
except provide the "commit" and "rollback" function, but there you go,
the "tangible transaction object" to hold onto.
Nice. Simple and explicit (it eliminates the previous ambiguity of commit
without begin).
# 3. the connection. this is "new" (its currenly called a
SQLSession...but that name changes to ConnectionProxy) generally used to
say "do this with this explicit resource". this is to make everyones
needs for what I was calling "nested" stuff possible, but youre right
its not really "nested", so now theres a tangible thing to hold onto
(well there has been for several weeks now, but now its the primary way
to do it).
# this is a ConnectionProxy object. it represents a single unique
connection from the
# pool, which it grabs when needed, or when a transaction is started it
holds onto it.
# this object is used automatically with the implicit stuff in #1, #2
conn = e.connection()
Is that supposed to be e.connect()? You had mentioned "a nice verb".
The ConnectionProxy should probably have a close() method that would explicitly release
the connection back to the pool. This is the type of thing that keeps programs tight and
squeaky (i.e. deallocating unused resources when they are no longer needed). Right now
CPython immediately cleans up objects when they go out of scope, but the python-devs are
always very adamant about not depending on this since it could change if a new garbage
collection scheme were adopted. Not to mention, objects often don't go out of scope until
long after their needed anyway. This type of "implicit" resource management is
how you get memory leaks in a dynamic language.
# 3a. transactions on the connection. same deal, same SQLTransaction
object:
trans = conn.begin()
trans.commit()
# 3b. but heres the new thing. since we have this nice tangible
ConnectionProxy, lets tangibly use it.
# We stick "using()" on the base ClauseElement (which is the base of all
Table/Select/Alias/etc objects),
# so you can use different connections all over the place.
mytable.using(conn).select().execute()
mytable.select().using(conn).execute()
What's the difference between these two? I think it should only allow the
first. But then again, I'd rather just have this:
Query(mytable, connection=conn).select()
and don't bother with using() on the table.
# or maybe we do the keyword thing
mytable.select(connection=conn).execute()
Nah, aren't select params usually clause elements?
# 3c. or if you want the "implicit" stuff to use this ConnectionProxy,
to bypass current transactional state, introduce new tranasctional
state, or even connect to some other database, and have it affect
everything going forward (on the current thread), do this:
engine.set_current(connection) # now everyone uses 'connection'
engine.restore_previous() # put the old one back.
I see what your doing with this, but does it have to be part of the engine? I
would say the engine's job is to create new connections and to compile queries
into a particular dialect of SQL. It looks like the engine also manages
connections and transactions. The connection/transaction manager should reside
in something like the SQLSession, whose job is to track connections and
transactions. Could set_current/restore_previous reside there instead? Try to
keep the engine doing just a few things well rather than everything for
everyone.
# 3d. 'using' can take engines or connections:
...
t.select().using(e2).execute()
t.using(e1.connection()).select().execute()
What does this mean? Will using() pull a connection out of the engine on each
execute() or will it execute every query on the same connection? This API is
mushy: it's starting to seem like you can pass any object into any function and
you just might get something back. However, all this versatility comes at a
cost: there are so many different combinations of functions and parameters that
work together and I can't remember which one does what. Also, debugging becomes
harder because there are more ways to pass in something that works but returns
the wrong result or executes it in the wrong context. Since the using() method
already takes a connection and it's easy enough to get the connection from the
engine, I vote to just support that version. Keep it simple.
# 3e. switch the current engine to use a different engine's connection.
this is a hacky thing. There is a user who is writing his own
"clustered database" solution involving data stored in redundant tables
across multiple DBs, he needed something exactly like this:
e1 = create_engine('postgres://foo')
e2 = create_engine('postgres://bar')
e1.set_current(e2.connection()) # now everyone that uses e1,
secretly uses e2
e1.restore_previous() # put it back
# in theory, things should be designed so that the "dialect" of e2 gets
used also if you swap it into e1. might need a little more internal
rearranging for that.
Eeek! I don't like that at all...which engine/connection/transaction/whatever
to I have here? If someone really needs this then they can implement it as a
wrapper around engine (I'm assuming that would be fairly trivial or you
wouldn't have included it here).
# 4. natural nesting behavior
As always, begin/commit/rollback uses the "outermost nested" idea, where
"nested" calls to begin/commit dont affect the single "outermost"
transaction.
e = create_engine('foo')
trans = e.begin()
try:
mytable.update().execute()
# then in some other function..
trans2 = e.begin()
try:
myothertable.update.execute()
trans2.commit() # doesnt commit
except:
trans2.rollback() # rolls trans2, trans
trans.commit() # commits mytable, myothertable
except:
trans.rollback() # rolls back trans
This is good. All "begin()" methods need to return a transaction object that
can either be committed or rolled back. This removes a critical ambiguity.
So nesting just means that kind of nesting. The other thing i was
calling nesting, yes, its just two concurrent connections, its not
really "nesting", for that you just use a second ConnectionProxy, and
you can optionally use set_current/restore_previous to make it take over
the "implicit" scope.
I'm not a fan of this implicit scope that used all over SA. It's too much magic. It makes it hard
to debug because things will seem to work even if they're not happening in the correct scope.
There's no easy way to tell that you've wandered out of the correct scope because it just defaults
to some "global" scope and goes merrily on it's way. I'd rather have an exception if I'm
not executing in the correct scope: fail early. Allow optional use of something like the
AutoConnectEngine when we want to "make a new one if one does not already exist" (even
have it be the default if you want, but allow it to be substituted with something more strict).
So to sum up Part I, heres the full vocabulary to learn regarding
connections, engines, transactions: SQLEngine, ConnectionProxy,
SQLTransaction. begin(), connection(), using(), commit(), rollback(),
set_current(), restore_previous().
====================================
Part II. Objectstore stuff. The ORM is a system for emitting SQL. it
uses the underlying SQL construction to do its work. There is the
objectstore module, the Mapper object and friends, and the Session
object, and thats mostly it. The Session object gets simplified,
begin()/commit() go away (theyll hang around in deprecated state), and
with regards to scoping, you get the basic terminology flush(),
Session(), mapper.using(), set_current() and restore_previous(), and
thats about it. so all the transactional terminology is removed from
objectstore/session since Session doesnt have much to do with
tranasctions, except that it uses one for its own flush() operation. if
you want your own SQL transaction demarcation, you use the engine stuff
as normal, so only one set of concepts to learn. The Session is no more
than a "bag of objects" which can "flush" SQL statements to the database.
# 5. simple objectstore work
obj = mapper.get(1)
obj.foo = 'bar'
objectstore.flush() # flush everything
obj2 = mapper.get(2)
obj2.foo = 'bar'
objectstore.flush(obj, obj2) # flush just two objects
# 6. multiple objectstore sessions. these are just two separate "bags
of objects". they arent doing anything explicit with the underlying
SQLEngines so are therefore using whatever ConnectionProxy's are current
for them.
obj1 = mapper.get(1)
obj1.foo = 'bar'
sess = Session() # second session
obj2 = mapper.using(sess).get(2)
obj2.foo = 'bar'
sess.flush() # saves obj2
objectstore.flush() # saves obj1
# 7. multiple objectstore sessions, sharing the same identity map
space. this is similar to when we were using "begin()/commit()" with
the Session.
obj1 = mapper.get(1)
obj1.foo = 'bar'
sess = Session(new_imap=False) # shares the identity map from the
current thread's session
obj2 = mapper.using(sess).get(2)
obj2.foo = 'bar'
sess.flush() # saves obj2
objectstore.flush() # saves obj1
# 8. setting a new Session to be the current thread's session, so that
all subsequent calls use it automatically.
obj1 = mapper.get(1)
obj1.foo = 'bar'
sess = Session() # new session
objectstore.set_current(sess) # set it as current
try:
obj2 = mapper.get(2)
obj2.foo = 'bar'
objectstore.flush() # saves obj2
finally:
objectstore.restore_previous()
objectstore.flush() # saves obj1
I'm fine with this as long as the thread-local pattern can be completely
disabled.
# so really, what before was session.begin()/commit(), is now more or
less like
objectstore.set_current(Session(new_imap=False))/objectstore.flush()/objectstore.restore_previous().
I guess the use case for this was so that you can write a function that
works with some objects and then saves them, without affecting the
objects that were manipulated outside of the function...but probably
easier to just use the mapper.using() syntax. if people want *exactly*
what was begin/commit before, within a simple pair of functions as
opposed to this more verbose style, we can do that too...but they are
*not* going to be called begin/commit. too confusing !
============================
transactions + objectstore stuff
# 9. the Session is just an object that knows how to spit out a bunch of
UPDATE etc. statements. Transactions are used via the engine as from
Part I:
e = create_engine('foo')
trans = e.begin()
try:
obj1 = mapper.get(1)
obj1.foo = 'bar'
objectstore.flush()
trans.commit()
except:
trans.rollback()
# 10. A shortcut for when you want a particular ConnectionProxy (or set
of ConnectionProxies) to go into effect corresponding to a particular
Session. This ConnectionProxy gets used for all operations with this
Session which involve the given ConnectionProxy's underlying engine:
e = create_engine('...')
conn = e.connection()
sess = Session(connections=[conn])
Why multiple connections? How will it decide which connection to use? Or are
you thinking of multicasting the SQL to all connections? :)
obj = mapper.using(sess).get(3) # all SQL for conn's engine will use
the conn above for this operation
sess.flush() # commit using conn
objectstore.set_current(sess) # now all ORM operations on both Mapper
and Session use conn implicitly
objectstore.restore_previous()
I hope I haven't been too harsh. I'm trying to give honest feedback about what
I think will be useful. Personally, I'd rather be force to change some things
in my code that uses SA rather than be forced to live with warts forever for
the sake of backward compatibility.
One recurring theme we seem to be butting heads on is that you like to have
extra scoping code all over the place. While this may add a lot of flexibility,
it also adds a lot of complexity which is not needed in most cases. Things like
that are usually easy to add as an wrapper or extension when needed. This
approach also incurs no extra cost when they are not needed.
In summary, I like the new transaction interface, and changing commit() to
flush() on the Session is a must. However, I think the engine (schema?), table,
and mapper should stick to what they do best: provide a high-level wrapper
around the Python DBAPI. They should have very little or nothing to do with
managing connection, transaction, or session scope, which is the job of
frameworks built on top of them.
~ Daniel
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Sqlalchemy-users mailing list
Sqlalchemy-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlalchemy-users