[sqlalchemy] Re: Temporarily disable DB persistence for optimization routine

Ruben Di Battista Thu, 11 Jan 2018 08:33:37 -0800

Dear Mike, 

thank you for the fast response as usual.


Your comment made me think. Actually I was not adding things in the session 
directly. I revised my code and I believe the behaviour I'm describing is 
related to the application of the UniqueObject patter described in the 
documentation. 

What I'm doing is, having this mixin:

""" https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/UniqueObject 
"""
import abc


def _unique(session, instance, hash_key, query_func):
    cache = getattr(session, '_unique_cache', None)
    if cache is None:
        session._unique_cache = cache = {}

    key = (type(instance), hash_key())
    if key in cache:
        return cache[key]
    else:
        with session.no_autoflush:
            q = query_func(session)
            obj = q.first()
            if not obj:
                obj = instance
                if(isinstance(instance, UniqueAssociationProxy)):
                    session.add(obj)
        cache[key] = obj
        return obj


class UniqueAssociationProxy(object):
    @abc.abstractmethod
    def unique_hash(self, *arg, **kw):
        raise NotImplementedError()

    @abc.abstractmethod
    def unique_filter(self, query, *arg, **kw):
        raise NotImplementedError()

    @abc.abstractmethod
    def ensure_unicity(self, session):
        return NotImplementedError()


Applied to the many by many mappings

class ConstrainedSatellite(UniqueAssociationProxy):
    """ This class is used to be mapped with SQLALchemy in the association
    object """

    def __init__(self, constraint_name, value):
        # Temporarily assigning the name to a string
        # in order to ensure uniqueness
        # https://goo.gl/LbJ7wf
        # self._constraint_name = name
        self._constraint_name = constraint_name
        self.value = value

    @property
    def name(self):
        if self.constraint is not None:
            return self.constraint.name
        else:
            return self._constraint_name

    def unique_hash(self):
        return self._constraint_name

    def unique_filter(self, session):
        return 
session.query(Constraint).filter_by(name=self._constraint_name)

    def ensure_unicity(self, session):
        instance = Constraint(name=self._constraint_name)
        self.constraint = _unique(session, instance, self.unique_hash,
                                  self.unique_filter)
        return self


class WeightedSatellite(UniqueAssociationProxy):
    """ This class is uded to be mapped with SQLAlchemy in the association
    proxy with the Weights """

    def __init__(self, weight_name, value):
        # Temporarily assigning the name to a string
        # in order to ensure uniqueness
        # https://goo.gl/LbJ7wf
        self._weight_name = weight_name
        self.value = value

    @property
    def name(self):
        if self.weight is not None:
            return self.weight.name
        else:
            return self._weight_name

    def unique_hash(self):
        return self._weight_name

    def unique_filter(self, session):
        return session.query(Weight).filter_by(name=self._weight_name)

    def ensure_unicity(self, session):
        instance = Weight(name=self._weight_name)
        self.weight = _unique(session, instance, self.unique_hash,
                              self.unique_filter)
        return self


(In the previous message the mappings are reported).

Then the uniqueness is ensured by: 

""" https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/UniqueObject 
"""
import abc


def _unique(session, instance, hash_key, query_func):
    cache = getattr(session, '_unique_cache', None)
    if cache is None:
        session._unique_cache = cache = {}

    key = (type(instance), hash_key())
    if key in cache:
        return cache[key]
    else:
        with session.no_autoflush:
            q = query_func(session)
            obj = q.first()
            if not obj:
                obj = instance
                if(isinstance(instance, UniqueAssociationProxy)):
                    session.add(obj)
        cache[key] = obj
        return obj


class UniqueAssociationProxy(object):
    @abc.abstractmethod
    def unique_hash(self, *arg, **kw):
        raise NotImplementedError()

    @abc.abstractmethod
    def unique_filter(self, query, *arg, **kw):
        raise NotImplementedError()

    @abc.abstractmethod
    def ensure_unicity(self, session):
        return NotImplementedError()


In this way I'm making SQLA to store everything at instance init time, if I 
well understand.

Could you please help me understand how to improve the situation? 

Thanks in advance. 

On Thursday, January 4, 2018 at 6:05:38 PM UTC+1, Ruben Di Battista wrote:
>
> Hello, 
> I'm writing a satellite passage scheduler that has a database persistence 
> layer to store the scheduled passages. 
>
> The DB schema is organized as follows: 
> - A table storing the satellites (using NORAD No as Primary Key)
> - A table storing the ground stations where to compute the passages of the 
> satellites
> - A table storing the passages of these satellites, with two foreign keys 
> linking each passage to a Ground Station and a Satellite
> - A table storing all the types of constraints a satellite can have
> - A table storing all the types of weights (used to perform the 
> scheduling) that can be assigned to each satellite
>
> Than I configured some association proxies (with the related `Unique 
> Object` pattern) in order to assign the weights and the satellites as a 
> dictionary
>
> sat.constraints['min_elevation']= 10
>
> The details of the relationships here below:
>
> # Relationship definitions
> orm.mapper(Satellite, satellite, properties={
>     'passages': orm.relationship(Passage,
>                                  backref='satellite',
>                                  order_by=passage.c.aos,
>                                  cascade='all, delete-orphan'),
>
>
>     '_constraints': orm.relationship(
>         ConstrainedSatellite, backref='satellite',
>         collection_class=orm.collections.attribute_mapped_collection(
> 'name'),
>         cascade='all, delete-orphan'),
>
>
>     '_weights': orm.relationship(
>         WeightedSatellite, backref='satellite',
>         collection_class=orm.collections.attribute_mapped_collection(
> 'name'),
>         lazy='joined',
>         cascade='all, delete-orphan'),
>
>
>     '_tle': satellite.c.tle
>
>
> })
>
>
> orm.mapper(Constraint, constraint, properties={
>     'satellites': orm.relationship(ConstrainedSatellite, backref=
> 'constraint')
>
>
> })
>
>
> orm.mapper(Weight, weight, properties={
>     'satellites': orm.relationship(WeightedSatellite, backref='weight')
> })
>
>
>
>
> orm.mapper(ConstrainedSatellite, constraint_satellite)
>
>
> orm.mapper(WeightedSatellite, weight_satellite)
>
>
> orm.mapper(PassageData, passage_data)
>
>
> orm.mapper(Passage, passage, properties={
>     'angles': orm.relationship(PassageData, backref='passage',
>                                order_by=passage_data.c.time,
>                                cascade='all, delete-orphan')
>
>
> },
>     confirm_deleted_rows=False
> )
>
>
> orm.mapper(GroundStation, ground_station, properties={
>     'passages': orm.relationship(Passage, backref='ground_station',
>                                  order_by=passage.c.aos,
>                                  cascade='all, delete-orphan')
> })
>
>
>
>
> # Association Proxies
> Satellite.constraints = association_proxy(
>     '_constraints', 'value', creator=constrained_sat_creator
> )
>
>
> Satellite.weights = association_proxy(
>     '_weights', 'value', creator=weighted_sat_creator
> )
>
>
>
>
> ConstrainedSatellite.constraint_name = association_proxy('constraint', 
> 'name')
> WeightedSatellite.weight_name = association_proxy('weight', 'name')
>
>
>
>
> From the tests everything is working as expected. The problem is that I'm 
> getting some performance issues while performing the scheduling procedure: 
> this procedure needs to read the `constraints` and `weights` associated to 
> each satellite to select the best passages among all the ones possible on 
> each ground station. While reading the values of the `constraints` and 
> `weights`, SQLAlchemy needs to store the Passage object in the Database. So 
> all the passages are stored in the database during the propagation of all 
> the possible passages, and then I need to manually expunge or delete the 
> passages that haven't been scheduled from the DB. 
>
> What I would like to achieve is to "shut down", after selecting the ground 
> stations and the satellites for which to perform the optimization, the 
> SQLAlchemy persistence in order to perform the optimization procedure only 
> on Python objects, without having SQLAlchemy to store them in the DB, and 
> then just storing efficiently the optimized, smaller, list of them that are 
> computed by the scheduling algorithm.
>
> Currently the scheduling script is like this (schematically):
>
>
>     # Create a DB session
>     session_factory = orm.sessionmaker(db)
>     session = orm.scoped_session(session_factory)
>
>
>     # Retrieve satellites from DB
>     sats = session.query(Satellite).all()
>
>
>     # Retrieve gss
>     ground_stations = session.query(GroundStation).all()
>
>
>     # Init Scheduler instance
>     scheduler = Scheduler(sats, ground_stations, start_day)
>
>
>     # This methods generates all the possible passages of all satellites 
> on all ground stations
>     # it needs to read `constraints` and `weights` for each satellite to 
> perform what it needs.
>     # currently all the passages get stored in the DB
>     all_passages = scheduler.propagate() 
>
>
>     # This method selects from the totality of all_passages, a subset 
> that fulfills all 
>     # the constraints.
>     scheduled_passages = prop.schedule(all_passages, iterations, pool)
>
>
>     # Remove from session passages not scheduled
>     # === This is the thing I would like to avoid since it's slow === 
>     for passage in all_passages:
>         if passage not in scheduler_passages:
>             session.expunge(passage)
>             session.delete(passage)
>
>
>     session.commit()
>     session.close()
>
> As you can see, I need to manually expunge or delete the passages that are 
> not scheduled by the algorithm. If the number of passages is huge, that 
> means performing a DELETE query for each passage, that is slow. What I 
> would like to achieve is to have `scheduler.propagate`, and 
> `scheduler.schedule` method to perform operations only on Python objects 
> (fully loaded with all the correct parameters from the DB), avoiding the 
> need to store and then delete all the passages in the DB. Can I "shut down" 
> temporarily the DB persistence? 
>
> Thanks for you help, 
> RdB
>
> PS: I'm on MySQL (MariaDB)
>
>
>
>
>
>
>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

[sqlalchemy] Re: Temporarily disable DB persistence for optimization routine

Reply via email to