On Mon, Oct 20, 2014 at 2:44 PM, Mike Bayer wrote:
> As I’ve established oslo.db blueprints which will roll out new SQLAlchemy
> connectivity patterns for consuming applications within both API [1] and
> tests [2], one of the next big areas I’m to focus on is that of querying.
> If one looks at how SQLAlchemy ORM queries are composed across Openstack, the
> most prominent feature one finds is the prevalent use of the model_query()
> initiation function.This is a function that is implemented in a specific
> way for each consuming application; its purpose is to act as a factory for
> new Query objects, starting from the point of acquiring a Session, starting
> up the Query against a selected model, and then augmenting that Query right
> off with criteria derived from the given application context, typically
> oriented around the widespread use of so-called “soft-delete” columns, as
> well as a few other fixed criteria.
>
> There’s a few issues with model_query() that I will be looking to solve,
> starting with the proposal of a new blueprint. Key issues include that it
> will need some changes to interact with my new connectivity specification, it
> may need a big change in how it is invoked in order to work with some new
> querying features I also plan on proposing at some point (see
> https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Baked_Queries), and
> also it’s current form in some cases tends to slightly discourage the
> construction of appropriate queries.
>
> In order to propose a new system for model_query(), I have to do a survey of
> how this function is implemented and used across projects. Which is why we
> find me talking about Neutron today - Neutron’s model_query() system is a
> much more significant construct compared to that of all other projects. It
> is interesting because it makes clear some use cases that SQLAlchemy may very
> well be able to help with. It also seems to me that in its current form it
> leads to SQL queries that are poorly formed - as I see this, on one hand we
> can blame the structure of neutron’s model_query() for how this occurs, but
> on the other, we can blame SQLAlchemy for not providing more tools oriented
> towards what Neutron is trying to do. The use case Neutron has here is very
> common throughout many Python applications, but as yet I’ve not had the
> opportunity to address this kind of pattern in a comprehensive way.
>
> I first sketched out my concerns on a Neutron issue
> https://bugs.launchpad.net/neutron/+bug/1380823, however I was encouraged to
> move it over to the mailing list.
>
> Specifically with Neutron’s model_query(), we're talking here about the
> plugin architecture in neutron/db/common_db_mixin.py, where the
> register_model_query_hook() method presents a way of applying modifiers to
> queries. This system appears to be used by: db/external_net_db.py,
> plugins/ml2/plugin.py, db/portbindings_db.py,
> plugins/metaplugin/meta_neutron_plugin.py.
>
> What the use of the hook has in common in these cases is that a LEFT OUTER
> JOIN is applied to the Query early on, in anticipation of either the
> filter_hook or result_filters being applied to the query, but only
> *possibly*, and then even within those hooks as supplied, again only
> *possibly*. It's these two "*possiblies*" that leads to the use of LEFT OUTER
> JOIN - this extra table is present in the query's FROM clause, but if we
> decide we don't need to filter on it, the idea is that it's just a left outer
> join, which will not change the primary result if not added to what’s being
> filtered. And even, in the case of external_net_db.py, maybe we even add a
> criteria "WHERE IS NULL", that is doing a "not contains" off
> of this left outer join.
>
> The result is that we can get a query like this:
>
> SELECT a.* FROM a LEFT OUTER JOIN b ON a.id=b.aid WHERE b.id IS NOT NULL
>
> this can happen for example if using External_net_db_mixin, the outerjoin to
> ExternalNetwork is created, _network_filter_hook applies
> "expr.or_(ExternalNetwork.network_id != expr.null())", and that's it.
>
> The database will usually have a much easier time if this query is expressed
> correctly [3]:
>
>SELECT a.* FROM a INNER JOIN b ON a.id=b.aid
>
> the reason this bugs me is because the SQL output is being compromised as a
> result of how the plugin system is organized. Preferable would be a system
> where the plugins are either organized into fewer functions that perform all
> the checking at once, or if the plugin system had more granularity to know
> that it needs to apply an optional JOIN or not. My thoughts for new
> SQLAlchemy/oslo.db features are being driven largely by Neutron’s use case
> here.
>
> Towards my goal of proposing a better system of model_query(), along with
> Neutron’s heavy use of generically added criteria, I’ve put some thoughts
> down on a new SQLAlchemy feature which would also be backported