As I’ve established oslo.db blueprints which will roll out new SQLAlchemy 
connectivity patterns for consuming applications within both API [1] and tests 
[2], one of the next big areas I’m to focus on is that of querying.   If one 
looks at how SQLAlchemy ORM queries are composed across Openstack, the most 
prominent feature one finds is the prevalent use of the model_query() 
initiation function.    This is a function that is implemented in a specific 
way for each consuming application; its purpose is to act as a factory for new 
Query objects, starting from the point of acquiring a Session, starting up the 
Query against a selected model, and then augmenting that Query right off with 
criteria derived from the given application context, typically oriented around 
the widespread use of so-called “soft-delete” columns, as well as a few other 
fixed criteria.

There’s a few issues with model_query() that I will be looking to solve, 
starting with the proposal of a new blueprint.   Key issues include that it 
will need some changes to interact with my new connectivity specification, it 
may need a big change in how it is invoked in order to work with some new 
querying features I also plan on proposing at some point (see 
https://wiki.openstack.org/wiki/OpenStack_and_SQLAlchemy#Baked_Queries), and 
also it’s current form in some cases tends to slightly discourage the 
construction of appropriate queries.

In order to propose a new system for model_query(), I have to do a survey of 
how this function is implemented and used across projects.  Which is why we 
find me talking about Neutron today - Neutron’s model_query() system is a much 
more significant construct compared to that of all other projects.   It is 
interesting because it makes clear some use cases that SQLAlchemy may very well 
be able to help with.  It also seems to me that in its current form it leads to 
SQL queries that are poorly formed - as I see this, on one hand we can blame 
the structure of neutron’s model_query() for how this occurs, but on the other, 
we can blame SQLAlchemy for not providing more tools oriented towards what 
Neutron is trying to do.   The use case Neutron has here is very common 
throughout many Python applications, but as yet I’ve not had the opportunity to 
address this kind of pattern in a comprehensive way.   

I first sketched out my concerns on a Neutron issue 
https://bugs.launchpad.net/neutron/+bug/1380823, however I was encouraged to 
move it over to the mailing list.

Specifically with Neutron’s model_query(), we're talking here about the plugin 
architecture in neutron/db/common_db_mixin.py, where the 
register_model_query_hook() method presents a way of applying modifiers to 
queries. This system appears to be used by: db/external_net_db.py, 
plugins/ml2/plugin.py, db/portbindings_db.py, 
plugins/metaplugin/meta_neutron_plugin.py.

What the use of the hook has in common in these cases is that a LEFT OUTER JOIN 
is applied to the Query early on, in anticipation of either the filter_hook or 
result_filters being applied to the query, but only *possibly*, and then even 
within those hooks as supplied, again only *possibly*. It's these two 
"*possiblies*" that leads to the use of LEFT OUTER JOIN - this extra table is 
present in the query's FROM clause, but if we decide we don't need to filter on 
it, the idea is that it's just a left outer join, which will not change the 
primary result if not added to what’s being filtered. And even, in the case of 
external_net_db.py, maybe we even add a criteria "WHERE <extra model id> IS 
NULL", that is doing a "not contains" off of this left outer join.

The result is that we can get a query like this:

    SELECT a.* FROM a LEFT OUTER JOIN b ON a.id=b.aid WHERE b.id IS NOT NULL

this can happen for example if using External_net_db_mixin, the outerjoin to 
ExternalNetwork is created, _network_filter_hook applies 
"expr.or_(ExternalNetwork.network_id != expr.null())", and that's it.

The database will usually have a much easier time if this query is expressed 
correctly [3]:

   SELECT a.* FROM a INNER JOIN b ON a.id=b.aid

the reason this bugs me is because the SQL output is being compromised as a 
result of how the plugin system is organized. Preferable would be a system 
where the plugins are either organized into fewer functions that perform all 
the checking at once, or if the plugin system had more granularity to know that 
it needs to apply an optional JOIN or not.   My thoughts for new 
SQLAlchemy/oslo.db features are being driven largely by Neutron’s use case here.

Towards my goal of proposing a better system of model_query(), along with 
Neutron’s heavy use of generically added criteria, I’ve put some thoughts down 
on a new SQLAlchemy feature which would also be backported to oslo.db. The 
initial sketch is at 
https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event,
 and the main idea is that Query would include a system by which we can ask 
questions about what it selects from and what it joins to and get definitive 
answers, with this use case in mind.    It would allow functions like “we need 
to INNER JOIN to X but only if our query is already asking for something about 
X” to be easy to write, and would allow Neutron’s system (as well as others) to 
be simplified and generate better queries.  A new event hook would also be 
supplied that allows for a clean place that these additional criteria can be 
augmented on a Query, without the need for the Query to be “created” from a 
factory such as model_query(), e.g. the feature would integrate transparently 
(somehow) with the system I’m proposing separately in [1], allowing the Query() 
API to be used fully without encouraging queries that are only against “a 
single model class".

So as is customary before I start proposing changes to consuming projects, I’m 
putting this out here to get any initial comments on this; such as, if someone 
could give me some background on Neutron’s model_query system, if it is used by 
any plugins that aren’t in the neutron source tree I have to know about, and if 
there is agreement that we’d be better off trying to use INNER JOIN and EXISTS 
appropriately, rather than “cheating” with an OUTER JOIN the way it is now.   
It may be the case that the queries right now are against small datasets or are 
otherwise taking advantage of MySQL’s planner so that they work out fine 
anyway, however in [3] I illustrate how relying on OUTER JOINs generically can 
often lead to poorer performance; if it isn’t a problem right now in Neutron, 
the pattern that’s used here is still less than ideal as it does not allow full 
control of how the SQL is rendered compared to what is needed, and I’d like to 
propose systems that are both more compatible with future needs (such as baked 
query) and which also can do a better job.


[1] make EngineFacade a Facade: https://review.openstack.org/#/c/125181/    

[2] add long-lived transactionalized DB fixtures: 
https://review.openstack.org/#/c/117335/

[3] some background on how INNER vs OUTER join can be a big deal in many cases 
(more often than not).   I tested with a schema containing two tables: “a" and 
“b”.  “a" has 10000 rows and “b" has 1M.  The “b" rows are linked via indexed 
foreign key to a subset of rows in “a".  Comparing the two queries, the OUTER 
JOIN query fails to use the index on “b.aid”:

 mysql> EXPLAIN SELECT a.* FROM a JOIN b ON a.id=b.aid;
 
+----+-------------+-------+--------+---------------+---------+---------+------------+--------+--------------------------+
 | id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
 
+----+-------------+-------+--------+---------------+---------+---------+------------+--------+--------------------------+
 | 1 | SIMPLE | b | index | b_aid | b_aid | 5 | NULL | 999000 | Using where; 
Using index |
 | 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | test.b.aid | 1 | NULL |
 
+----+-------------+-------+--------+---------------+---------+---------+------------+--------+--------------------------+
 2 rows in set (0.00 sec)

 mysql> EXPLAIN SELECT a.* FROM a LEFT OUTER JOIN b ON a.id=b.aid WHERE b.id IS 
NOT NULL;
 
+----+-------------+-------+--------+---------------+---------+---------+------------+--------+-------------+
 | id | select_type | table | type | possible_keys | key | key_len | ref | rows 
| Extra |
 
+----+-------------+-------+--------+---------------+---------+---------+------------+--------+-------------+
 | 1 | SIMPLE | b | ALL | PRIMARY,b_aid | NULL | NULL | NULL | 999000 | Using 
where |
 | 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 4 | test.b.aid | 1 | NULL |
 
+----+-------------+-------+--------+---------------+---------+---------+------------+--------+-------------+
 2 rows in set (0.00 sec)


Postgresql may have different results here, but in general OUTER JOINs are a 
common source of poor performance, and I think we should have ORM patterns in 
place where there is no tendency for SQL to be pushed in one direction or 
another based on code organization.





_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to