On Mar 30, 4:42 pm, "Michael Bayer" <[email protected]> wrote:
> George V. Reilly wrote:
> > We're using SQLAlchemy sharding to partition accounts across a couple
> > of databases. We want to add more partitions, but first we need to
> > eliminate some unnecessary cross-partition queries.
> >
> > This works well most of the time, but we're finding that some queries
> > do not have a "value". These are all of the form
> >
> > SELECT shopping_list_items.version AS shopping_list_items_version
> > FROM shopping_list_items
> > WHERE shopping_list_items.account_id = :param_1
> > AND shopping_list_items.shopping_list_item_id = :param_2
> >
> > and :param1 is of the form _BindParamClause(u'%(63636624 param)s',
> > None, type_=UUID())
> >
> > Typically, I'm seeing this come out of the innards of SQLAlchemy,
> > as one of several queries triggered by, say, a session.merge().
>
> The only Query() I can see getting generated that would have non-valued
> bindparams would be during a _get(). The values should be present in
> query._params. If you need more info I can dig in to recall how the keys
> of that dictionary are formatted in this case.
Thanks, Michael. Here's what I came up with:
class FindShardableId(sqlalchemy.sql.ClauseVisitor):
def __init__(self, ids, key_fields, get_shard, params):
self.ids = ids
self.key_fields = key_fields
self.get_shard = get_shard
self.params = params
def visit_binary(self, binary):
if not self._check_side(binary, binary.left,
binary.right):
# Lazy load properties tend to be reversed, with the
constant on the left
self._check_side(binary, binary.right, binary.left)
def _check_side(self, binary, side, other_side):
if isinstance(side, sqlalchemy.Column) and side.name in
self.key_fields:
if binary.operator == sqlalchemy.sql.operators.eq:
value = getattr(other_side, "value", None)
if (value is None and
isinstance(other_side,
sqlalchemy.sql.expression._BindParamClause)):
value = self.params.get(other_side.key)
if value is not None:
self.ids.append(self.get_shard(value))
return True
elif binary.operator ==
sqlalchemy.sql.operators.in_op:
for bind in other_side.clauses:
self.ids.append(self.get_shard(bind.value))
return True
class QuerySharder(object):
def sessionmaker(self, **sessionmaker_args):
Session = sqlalchemy.orm.sessionmaker(
class_ = sqlalchemy.orm.shard.ShardedSession,
**sessionmaker_args)
Session.configure(
shards=self.shards,
shard_chooser=self._shard_chooser,
id_chooser=self._id_chooser,
query_chooser=self._query_chooser)
return Session
def _query_chooser(self, query):
ids = []
if query._criterion is not None:
FindShardableId(
ids,
set(["account_id", "account_guid"]),
lambda account_id:
self.shard_manager.shard_id_from_guid(account_id),
query._params
).traverse(query._criterion)
if len(ids) == 0:
logging.warn("\n\n! Executing query against all
shards; "
"this may not be optimal:\n\t{0}\n
\tParams: {1}\n".format(
str(query), str(query._params)))
return self.shards.keys()
else:
return ids
I really don't like the
isinstance(other_side,
sqlalchemy.sql.expression._BindParamClause))
in the middle of _check_side. Is there a cleaner way to do this?
I found that a combination of
* the above _check_side and two-sided visit_binary
* doing a better job of declaring ForeignKey relationships in Columns
* some explicit primaryjoins in calls to relation()
cleaned up all the cases where SA wasn't providing the ids in queries
Perhaps the sharding sample in SA 0.6 could be expanded?
--
/George V. Reilly, Seattle
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en.