A better way of conveying my problem might be: `get()` returns any object which is in the identity map; I am having issues after the identity map is now populated with some objects which are 'partials' that were created via 'load_only' parameters. The benefits of using `get` to minimize a database query are outnumbered by dozens of selects to the database.
I'll try to re-explain everything with a more details below. On Sunday, May 13, 2018 at 11:41:56 AM UTC-4, Mike Bayer wrote: > > > Phase one is a query with lots of joinedload, and *also*, a > load_only() (Is that right? your previous email shows a joinedload + > load_only, this email OTOH says "phase 1, lots of eagerloading and > joinedloading", is that the one with the "load_only" ?) > > Phase 1 can have a load_only, usually not. The problem with the load_only is in the stuff that now happens before phase 1. Then, you say, the get() in phase 2 (which we assume is the get() you > refer towards in your previous email) hits objects from this > joinedload + loadonly 99% of the time, so 1% of the time they are from > a cache, which you haven't told me how objects get into that or what > state they are in. > That 1% are direct queries against the Database. The cache I'm talking about is SqlAlchemy's identity map. Phase-1: - SqlAlchemy queries PostgreSQL for the primary keys of objects under several criteria. - The app aggregates all the primary keys by object type, then selects the objects by the primary key, making extensive use of the joined/eager loading. - The generated SqlAlchemy objects are inspected and analyzed for foreign keys that are of interest, which are then loaded via the aggregated gets. - Batching selects like this enormously improved PostgreSQL performance by influencing how it loads/unloads and caches data. The queries only involve indexes, and then tables only get 1 (sometimes 2) operations with the primary index. Phase-2: - The objects are interacted with. Their attributes and relationships are accessed. - Sometimes this code needs to use a specific object that was not-necessarily loaded in Phase-1. SqlAlchemy's `get(primary_key)` is used to leverage the identity map, which would either return an existing object by primary key, or hit the database. This almost always results in a "cache hit" from the identity map, so no sql is emitted. Occasionally the object was not loaded, so sql must be emitted to select it. The current problem was created when a new "Phase-0" was introduced. In Phase-0, SqlAlchemy queries a number of objects with `load_only` specifying a small subset of their columns. If these objects are queried in Phase-2 via `get()`, the identity map usually returns the previously selected object with a small subset of the the columns; this causes sql to be emitted and data to load every time a previously untouched attributed or relationship is accessed for the bulk of the objects' columns and relationships. This results in dozens of sql selects. What I'd like to accomplish: not return any objects which were loaded in Phase-0 via the `get` queries in Phase-2. I'd use a second session to handle the Phase-0 objects (which are readonly), except that means two database connections. Loading the 'full' objects isn't a good option either - some of these objects span a half dozen database tables and are mapped back into a single object via association_proxy. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
