Dear Mike, thank you for the fast response as usual.
Your comment made me think. Actually I was not adding things in the session directly. I revised my code and I believe the behaviour I'm describing is related to the application of the UniqueObject patter described in the documentation. What I'm doing is, having this mixin: """ https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/UniqueObject """ import abc def _unique(session, instance, hash_key, query_func): cache = getattr(session, '_unique_cache', None) if cache is None: session._unique_cache = cache = {} key = (type(instance), hash_key()) if key in cache: return cache[key] else: with session.no_autoflush: q = query_func(session) obj = q.first() if not obj: obj = instance if(isinstance(instance, UniqueAssociationProxy)): session.add(obj) cache[key] = obj return obj class UniqueAssociationProxy(object): @abc.abstractmethod def unique_hash(self, *arg, **kw): raise NotImplementedError() @abc.abstractmethod def unique_filter(self, query, *arg, **kw): raise NotImplementedError() @abc.abstractmethod def ensure_unicity(self, session): return NotImplementedError() Applied to the many by many mappings class ConstrainedSatellite(UniqueAssociationProxy): """ This class is used to be mapped with SQLALchemy in the association object """ def __init__(self, constraint_name, value): # Temporarily assigning the name to a string # in order to ensure uniqueness # https://goo.gl/LbJ7wf # self._constraint_name = name self._constraint_name = constraint_name self.value = value @property def name(self): if self.constraint is not None: return self.constraint.name else: return self._constraint_name def unique_hash(self): return self._constraint_name def unique_filter(self, session): return session.query(Constraint).filter_by(name=self._constraint_name) def ensure_unicity(self, session): instance = Constraint(name=self._constraint_name) self.constraint = _unique(session, instance, self.unique_hash, self.unique_filter) return self class WeightedSatellite(UniqueAssociationProxy): """ This class is uded to be mapped with SQLAlchemy in the association proxy with the Weights """ def __init__(self, weight_name, value): # Temporarily assigning the name to a string # in order to ensure uniqueness # https://goo.gl/LbJ7wf self._weight_name = weight_name self.value = value @property def name(self): if self.weight is not None: return self.weight.name else: return self._weight_name def unique_hash(self): return self._weight_name def unique_filter(self, session): return session.query(Weight).filter_by(name=self._weight_name) def ensure_unicity(self, session): instance = Weight(name=self._weight_name) self.weight = _unique(session, instance, self.unique_hash, self.unique_filter) return self (In the previous message the mappings are reported). Then the uniqueness is ensured by: """ https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/UniqueObject """ import abc def _unique(session, instance, hash_key, query_func): cache = getattr(session, '_unique_cache', None) if cache is None: session._unique_cache = cache = {} key = (type(instance), hash_key()) if key in cache: return cache[key] else: with session.no_autoflush: q = query_func(session) obj = q.first() if not obj: obj = instance if(isinstance(instance, UniqueAssociationProxy)): session.add(obj) cache[key] = obj return obj class UniqueAssociationProxy(object): @abc.abstractmethod def unique_hash(self, *arg, **kw): raise NotImplementedError() @abc.abstractmethod def unique_filter(self, query, *arg, **kw): raise NotImplementedError() @abc.abstractmethod def ensure_unicity(self, session): return NotImplementedError() In this way I'm making SQLA to store everything at instance init time, if I well understand. Could you please help me understand how to improve the situation? Thanks in advance. On Thursday, January 4, 2018 at 6:05:38 PM UTC+1, Ruben Di Battista wrote: > > Hello, > I'm writing a satellite passage scheduler that has a database persistence > layer to store the scheduled passages. > > The DB schema is organized as follows: > - A table storing the satellites (using NORAD No as Primary Key) > - A table storing the ground stations where to compute the passages of the > satellites > - A table storing the passages of these satellites, with two foreign keys > linking each passage to a Ground Station and a Satellite > - A table storing all the types of constraints a satellite can have > - A table storing all the types of weights (used to perform the > scheduling) that can be assigned to each satellite > > Than I configured some association proxies (with the related `Unique > Object` pattern) in order to assign the weights and the satellites as a > dictionary > > sat.constraints['min_elevation']= 10 > > The details of the relationships here below: > > # Relationship definitions > orm.mapper(Satellite, satellite, properties={ > 'passages': orm.relationship(Passage, > backref='satellite', > order_by=passage.c.aos, > cascade='all, delete-orphan'), > > > '_constraints': orm.relationship( > ConstrainedSatellite, backref='satellite', > collection_class=orm.collections.attribute_mapped_collection( > 'name'), > cascade='all, delete-orphan'), > > > '_weights': orm.relationship( > WeightedSatellite, backref='satellite', > collection_class=orm.collections.attribute_mapped_collection( > 'name'), > lazy='joined', > cascade='all, delete-orphan'), > > > '_tle': satellite.c.tle > > > }) > > > orm.mapper(Constraint, constraint, properties={ > 'satellites': orm.relationship(ConstrainedSatellite, backref= > 'constraint') > > > }) > > > orm.mapper(Weight, weight, properties={ > 'satellites': orm.relationship(WeightedSatellite, backref='weight') > }) > > > > > orm.mapper(ConstrainedSatellite, constraint_satellite) > > > orm.mapper(WeightedSatellite, weight_satellite) > > > orm.mapper(PassageData, passage_data) > > > orm.mapper(Passage, passage, properties={ > 'angles': orm.relationship(PassageData, backref='passage', > order_by=passage_data.c.time, > cascade='all, delete-orphan') > > > }, > confirm_deleted_rows=False > ) > > > orm.mapper(GroundStation, ground_station, properties={ > 'passages': orm.relationship(Passage, backref='ground_station', > order_by=passage.c.aos, > cascade='all, delete-orphan') > }) > > > > > # Association Proxies > Satellite.constraints = association_proxy( > '_constraints', 'value', creator=constrained_sat_creator > ) > > > Satellite.weights = association_proxy( > '_weights', 'value', creator=weighted_sat_creator > ) > > > > > ConstrainedSatellite.constraint_name = association_proxy('constraint', > 'name') > WeightedSatellite.weight_name = association_proxy('weight', 'name') > > > > > From the tests everything is working as expected. The problem is that I'm > getting some performance issues while performing the scheduling procedure: > this procedure needs to read the `constraints` and `weights` associated to > each satellite to select the best passages among all the ones possible on > each ground station. While reading the values of the `constraints` and > `weights`, SQLAlchemy needs to store the Passage object in the Database. So > all the passages are stored in the database during the propagation of all > the possible passages, and then I need to manually expunge or delete the > passages that haven't been scheduled from the DB. > > What I would like to achieve is to "shut down", after selecting the ground > stations and the satellites for which to perform the optimization, the > SQLAlchemy persistence in order to perform the optimization procedure only > on Python objects, without having SQLAlchemy to store them in the DB, and > then just storing efficiently the optimized, smaller, list of them that are > computed by the scheduling algorithm. > > Currently the scheduling script is like this (schematically): > > > # Create a DB session > session_factory = orm.sessionmaker(db) > session = orm.scoped_session(session_factory) > > > # Retrieve satellites from DB > sats = session.query(Satellite).all() > > > # Retrieve gss > ground_stations = session.query(GroundStation).all() > > > # Init Scheduler instance > scheduler = Scheduler(sats, ground_stations, start_day) > > > # This methods generates all the possible passages of all satellites > on all ground stations > # it needs to read `constraints` and `weights` for each satellite to > perform what it needs. > # currently all the passages get stored in the DB > all_passages = scheduler.propagate() > > > # This method selects from the totality of all_passages, a subset > that fulfills all > # the constraints. > scheduled_passages = prop.schedule(all_passages, iterations, pool) > > > # Remove from session passages not scheduled > # === This is the thing I would like to avoid since it's slow === > for passage in all_passages: > if passage not in scheduler_passages: > session.expunge(passage) > session.delete(passage) > > > session.commit() > session.close() > > As you can see, I need to manually expunge or delete the passages that are > not scheduled by the algorithm. If the number of passages is huge, that > means performing a DELETE query for each passage, that is slow. What I > would like to achieve is to have `scheduler.propagate`, and > `scheduler.schedule` method to perform operations only on Python objects > (fully loaded with all the correct parameters from the DB), avoiding the > need to store and then delete all the passages in the DB. Can I "shut down" > temporarily the DB persistence? > > Thanks for you help, > RdB > > PS: I'm on MySQL (MariaDB) > > > > > > > > -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
