On Sunday, March 11, 2012 10:56:44 PM UTC-7, Michael Bayer wrote:
>
>
> On Mar 11, 2012, at 8:03 PM, George V. Reilly wrote:
>
> > We ran into a nasty problem with sharding a while back. We came up with 
> an effective workaround, but we don't fully understand the problem. Perhaps 
> someone here can provide more insight.
> > 
> > We shard requests to our production MySQL databases using 
> sqlalchemy.ext.horizontal_shard.ShardedSession. Our query_chooser uses  a 
> consistent hashing scheme on our account_ids to compute shard_ids. In 
> almost all cases, we know the account_id before performing a query, 
> therefore a query goes directly to the correct shard and only to that 
> shard. In the other cases, we're typically trying to find the account_id(s) 
> given some other key.
> > 
> > The nasty problem arose thus. We made a cross-shard query, which 
> apparently opened a database connection to more than one of the sharded 
> databases. The query returned an account_id and our associated SQLAlchemy 
> Account object. We then added data to the account and committed it — all on 
> the same scoped_session. Result: the thread (process?) blocks on the 
> connections to the other databases.
> > 
> > The workaround is to call commit() on the session after performing the 
> cross-shard query, and then call scoped_session() to make a fresh session 
> before adding data to the account.
> > 
> > Anyone understand what the underlying problem was? Is it in SQLAlchemy, 
> MySQL-python, or some lower level?
> > 
> > The pattern that we're moving towards with these cross-shard queries is 
> to perform them in a distinct session using a contextmanager which commits 
> the session. Account_ids and other keys are returned from the 
> contextmanager, but SQLAlchemy objects are not. One (or more) of our 
> SQLAlchemy Account objects are then created in the main session. I'm 
> worried about the >1 case, since those accounts may be distributed across 
> several shards.
>
>
> the Session works like this:   each time you execute() on a particular 
> Connection, that Connection is added to the state of the 
> SessionTransaction, and remains open in a transaction until the Session 
> itself is committed, rolled back, or closed.    It's in this way that a 
> Session coordinates multiple operations across many databases, and if 
> twophase=True it will also call prepare() on all those connections before 
> the commit.
>
> So when using ShardedSession, if you hit three databases, they're all in 
> that session's state.
>
> Let's say you wanted to close out the first two.  That may be fine, but 
> how do you know that those connections aren't in the Session's state due to 
> previous operations on those databases ?   So it's something of a tricky 
> problem, to say that you'd like to scan multiple database nodes with a 
> certain SELECT statement, but then once you find the node you care about, 
> close out the previous nodes.   You'd have to check first that they weren't 
> already there.     
>
> This is all doable though is not public API, to optimize this operation 
> would mean you'd need to peek inside of Session.transaction._connections 
> before and after the operation, then rollback() those connections you know 
> you don't want anymore and remove them from 
> Session.transaction._connections.
>
> So that's what's going on, it's just I think if you try yo experiment with 
> closing out individual connections from Session.transaction._connections it 
> may become apparent that this is a bit of an intricate case.
>
> Overall, I'd likely be using distinct Session objects explicitly in any 
> case within a horizontal sharding situation; assuming I can make sure that 
> the scope of a certain request is going to be focused entirely on just one 
> of those shards, not unlike the solution you came up with.    
>
Thanks, Michael. 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sqlalchemy/-/xvrlAHxrN3AJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to