Michael Bayer wrote:
The SQLEngine architecture, as well as the objectstore and everything else, heavily relies upon thread-local storage to share resources. So far, there has not really been much concept of a "connection" outside of the pooling module since it was all built in such a way theres usually nowhere to pass a "connection" around, its just implicitly available when needed (and implicitly returned to the pool when not). But it is nice to say, "conn = engine.connect()", with a nice verb and a tangible thing to hold onto. So this proposal, while not ditching the basic "implicit" behavior of SQL statement execution as well as Session construction that I think is pretty popular, introduces some more "explicitness" when its desired to use concurrent SQL connections or transactions, or concurrent Sessions, or a combination of both. It also simplifies the Session interface and removes all transaction-like terminology from the ORM package. there is actually very little new here, its mostly a rearrangement of terminology.


Part I.  SQL construction, transactions, engines. note that we
arent dealing with the SA ORM at all, no mappers, objectstores,
they are not not involved here.
==============

# 1. engine. nothing new here. it represents a particular connection URL using
# the dialect (to use a Hibernate term) of a particular vendor's database.

    e = create_engine('foo')

    # SQL statements constructed against the engine use it implicitly to
# execute themselves, connections are pulled from the pool behind the scenes and released.
    mytable.update().execute()

# 2. transaction, the implicit way, slightly new but not much.

    trans = e.begin()
    try:
        mytable.update().execute()
        trans.commit()
    except:
        trans.rollback()

the "trans" object is called a SQLTransaction, which doesnt do much except provide the "commit" and "rollback" function, but there you go, the "tangible transaction object" to hold onto.

Nice. Simple and explicit (it eliminates the previous ambiguity of commit 
without begin).


# 3. the connection. this is "new" (its currenly called a SQLSession...but that name changes to ConnectionProxy) generally used to say "do this with this explicit resource". this is to make everyones needs for what I was calling "nested" stuff possible, but youre right its not really "nested", so now theres a tangible thing to hold onto (well there has been for several weeks now, but now its the primary way to do it).

    # this is a ConnectionProxy object.  it represents a single unique 
connection from the
    # pool, which it grabs when needed, or when a transaction is started it 
holds onto it.
    # this object is used automatically with the implicit stuff in #1, #2
    conn = e.connection()


Is that supposed to be e.connect()? You had mentioned "a nice verb".

The ConnectionProxy should probably have a close() method that would explicitly release 
the connection back to the pool. This is the type of thing that keeps programs tight and 
squeaky (i.e. deallocating unused resources when they are no longer needed). Right now 
CPython immediately cleans up objects when they go out of scope, but the python-devs are 
always very adamant about not depending on this since it could change if a new garbage 
collection scheme were adopted. Not to mention, objects often don't go out of scope until 
long after their needed anyway. This type of "implicit" resource management is 
how you get memory leaks in a dynamic language.


# 3a. transactions on the connection. same deal, same SQLTransaction object:

    trans = conn.begin()
    trans.commit()

# 3b. but heres the new thing. since we have this nice tangible ConnectionProxy, lets tangibly use it. # We stick "using()" on the base ClauseElement (which is the base of all Table/Select/Alias/etc objects),
# so you can use different connections all over the place.

    mytable.using(conn).select().execute()
    mytable.select().using(conn).execute()

What's the difference between these two? I think it should only allow the 
first. But then again, I'd rather just have this:

Query(mytable, connection=conn).select()

and don't bother with using() on the table.


# or maybe we do the keyword thing
    mytable.select(connection=conn).execute()

Nah, aren't select params usually clause elements?


# 3c. or if you want the "implicit" stuff to use this ConnectionProxy, to bypass current transactional state, introduce new tranasctional state, or even connect to some other database, and have it affect everything going forward (on the current thread), do this:

    engine.set_current(connection)  # now everyone uses 'connection'
    engine.restore_previous()   # put the old one back.

I see what your doing with this, but does it have to be part of the engine? I 
would say the engine's job is to create new connections and to compile queries 
into a particular dialect of SQL. It looks like the engine also manages 
connections and transactions. The connection/transaction manager should reside 
in something like the SQLSession, whose job is to track connections and 
transactions. Could set_current/restore_previous reside there instead? Try to 
keep the engine doing just a few things well rather than everything for 
everyone.


# 3d. 'using' can take engines or connections:

    ...

    t.select().using(e2).execute()

    t.using(e1.connection()).select().execute()

What does this mean? Will using() pull a connection out of the engine on each 
execute() or will it execute every query on the same connection? This API is 
mushy: it's starting to seem like you can pass any object into any function and 
you just might get something back. However, all this versatility comes at a 
cost: there are so many different combinations of functions and parameters that 
work together and I can't remember which one does what. Also, debugging becomes 
harder because there are more ways to pass in something that works but returns 
the wrong result or executes it in the wrong context. Since the using() method 
already takes a connection and it's easy enough to get the connection from the 
engine, I vote to just support that version. Keep it simple.


# 3e. switch the current engine to use a different engine's connection. this is a hacky thing. There is a user who is writing his own "clustered database" solution involving data stored in redundant tables across multiple DBs, he needed something exactly like this:

    e1 = create_engine('postgres://foo')
    e2 = create_engine('postgres://bar')

e1.set_current(e2.connection()) # now everyone that uses e1, secretly uses e2
    e1.restore_previous()  # put it back

# in theory, things should be designed so that the "dialect" of e2 gets used also if you swap it into e1. might need a little more internal rearranging for that.

Eeek! I don't like that at all...which engine/connection/transaction/whatever 
to I have here? If someone really needs this then they can implement it as a 
wrapper around engine (I'm assuming that would be fairly trivial or you 
wouldn't have included it here).



# 4. natural nesting behavior

As always, begin/commit/rollback uses the "outermost nested" idea, where "nested" calls to begin/commit dont affect the single "outermost" transaction.

e = create_engine('foo')
trans = e.begin()
try:
    mytable.update().execute()

    # then in some other function..
    trans2 = e.begin()
    try:
        myothertable.update.execute()
        trans2.commit()   # doesnt commit
    except:
        trans2.rollback()   # rolls trans2, trans

    trans.commit()   # commits mytable, myothertable
except:
    trans.rollback()  # rolls back trans

This is good. All "begin()" methods need to return a transaction object that 
can either be committed or rolled back. This removes a critical ambiguity.


So nesting just means that kind of nesting. The other thing i was calling nesting, yes, its just two concurrent connections, its not really "nesting", for that you just use a second ConnectionProxy, and you can optionally use set_current/restore_previous to make it take over the "implicit" scope.

I'm not a fan of this implicit scope that used all over SA. It's too much magic. It makes it hard 
to debug because things will seem to work even if they're not happening in the correct scope. 
There's no easy way to tell that you've wandered out of the correct scope because it just defaults 
to some "global" scope and goes merrily on it's way. I'd rather have an exception if I'm 
not executing in the correct scope: fail early. Allow optional use of something like the 
AutoConnectEngine when we want to "make a new one if one does not already exist" (even 
have it be the default if you want, but allow it to be substituted with something more strict).


So to sum up Part I, heres the full vocabulary to learn regarding connections, engines, transactions: SQLEngine, ConnectionProxy, SQLTransaction. begin(), connection(), using(), commit(), rollback(), set_current(), restore_previous().

====================================

Part II. Objectstore stuff. The ORM is a system for emitting SQL. it uses the underlying SQL construction to do its work. There is the objectstore module, the Mapper object and friends, and the Session object, and thats mostly it. The Session object gets simplified, begin()/commit() go away (theyll hang around in deprecated state), and with regards to scoping, you get the basic terminology flush(), Session(), mapper.using(), set_current() and restore_previous(), and thats about it. so all the transactional terminology is removed from objectstore/session since Session doesnt have much to do with tranasctions, except that it uses one for its own flush() operation. if you want your own SQL transaction demarcation, you use the engine stuff as normal, so only one set of concepts to learn. The Session is no more than a "bag of objects" which can "flush" SQL statements to the database.

# 5.  simple objectstore work

obj = mapper.get(1)
obj.foo = 'bar'
objectstore.flush()  # flush everything

obj2 = mapper.get(2)
obj2.foo = 'bar'
objectstore.flush(obj, obj2)  # flush just two objects

# 6. multiple objectstore sessions. these are just two separate "bags of objects". they arent doing anything explicit with the underlying SQLEngines so are therefore using whatever ConnectionProxy's are current for them.

obj1 = mapper.get(1)
obj1.foo = 'bar'

sess = Session()  # second session
obj2 = mapper.using(sess).get(2)
obj2.foo = 'bar'
sess.flush()  # saves obj2

objectstore.flush()  # saves obj1

# 7. multiple objectstore sessions, sharing the same identity map space. this is similar to when we were using "begin()/commit()" with the Session.

obj1 = mapper.get(1)
obj1.foo = 'bar'

sess = Session(new_imap=False) # shares the identity map from the current thread's session
obj2 = mapper.using(sess).get(2)
obj2.foo = 'bar'
sess.flush()  # saves obj2

objectstore.flush()  # saves obj1

# 8. setting a new Session to be the current thread's session, so that all subsequent calls use it automatically.

obj1 = mapper.get(1)
obj1.foo = 'bar'
sess = Session()  # new session
objectstore.set_current(sess)  # set it as current
try:
    obj2 = mapper.get(2)
    obj2.foo = 'bar'
    objectstore.flush() # saves obj2
finally:
    objectstore.restore_previous()
objectstore.flush()  # saves obj1

I'm fine with this as long as the thread-local pattern can be completely 
disabled.


# so really, what before was session.begin()/commit(), is now more or less like objectstore.set_current(Session(new_imap=False))/objectstore.flush()/objectstore.restore_previous(). I guess the use case for this was so that you can write a function that works with some objects and then saves them, without affecting the objects that were manipulated outside of the function...but probably easier to just use the mapper.using() syntax. if people want *exactly* what was begin/commit before, within a simple pair of functions as opposed to this more verbose style, we can do that too...but they are *not* going to be called begin/commit. too confusing !

============================

transactions + objectstore stuff

# 9. the Session is just an object that knows how to spit out a bunch of UPDATE etc. statements. Transactions are used via the engine as from Part I:

e = create_engine('foo')
trans = e.begin()
try:
    obj1 = mapper.get(1)
    obj1.foo = 'bar'
    objectstore.flush()
    trans.commit()
except:
    trans.rollback()

# 10. A shortcut for when you want a particular ConnectionProxy (or set of ConnectionProxies) to go into effect corresponding to a particular Session. This ConnectionProxy gets used for all operations with this Session which involve the given ConnectionProxy's underlying engine:

e = create_engine('...')
conn = e.connection()
sess = Session(connections=[conn])

Why multiple connections? How will it decide which connection to use? Or are 
you thinking of multicasting the SQL to all connections? :)


obj = mapper.using(sess).get(3) # all SQL for conn's engine will use the conn above for this operation
sess.flush()  # commit using conn

objectstore.set_current(sess) # now all ORM operations on both Mapper and Session use conn implicitly
objectstore.restore_previous()


I hope I haven't been too harsh. I'm trying to give honest feedback about what 
I think will be useful. Personally, I'd rather be force to change some things 
in my code that uses SA rather than be forced to live with warts forever for 
the sake of backward compatibility.

One recurring theme we seem to be butting heads on is that you like to have 
extra scoping code all over the place. While this may add a lot of flexibility, 
it also adds a lot of complexity which is not needed in most cases. Things like 
that are usually easy to add as an wrapper or extension when needed. This 
approach also incurs no extra cost when they are not needed.

In summary, I like the new transaction interface, and changing commit() to 
flush() on the Session is a must. However, I think the engine (schema?), table, 
and mapper should stick to what they do best: provide a high-level wrapper 
around the Python DBAPI. They should have very little or nothing to do with 
managing connection, transaction, or session scope, which is the job of 
frameworks built on top of them.

~ Daniel


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Sqlalchemy-users mailing list
Sqlalchemy-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlalchemy-users

Reply via email to