On Tue, Mar 6, 2018 at 5:14 AM, KCY <kevint...@hotmail.com> wrote: > Context > > I'm currently designing the business and persistence layer that is going to > be used in various frontend applications (Web and standalone). To that end > I've been trying to reconcile ORM entities with a tight session scope but > I'm constantly running into the same issues. For web I haven't had a big > problem as sessions scoped to a single request work just fine, but for > standalone applications I've been having architectural problems. To > illustrate this, I'm using a simplified example. > > Setup - Model and database definition > > # MODELS > > class _Base(object): > @declared_attr > def __tablename__(cls): # This just generates table name from the > class name. > name = cls.__name__ > table_name = name[0] > for c in name[1:]: > if c.isupper(): > table_name += '_' > table_name += c > return table_name.lower() > > id = Column(Integer, primary_key=True) > > > Base = declarative_base(cls=_Base) > > class Tree(Base): > type = Column(String(200)) > > branches = relationship("Branch", back_populates="tree") > > def __repr__(self): > return "<Tree(id='{}', type='{}', branches='{}')>".format(self.id, > self.type, self.branches) > > > class Branch(Base): > name = Column(String(200)) > > tree_id = Column(Integer, ForeignKey('tree.id'), nullable=False) > tree = relationship("Tree", back_populates='branches') > > > leaves = relationship("Leaf", back_populates='branch') > > def __repr__(self): > return "<Branch(id='{}', name='{}', leaves='{}')>".format(self.id, > self.name, self.leaves) > > > class Leaf(Base): > size = Column(Integer) > > branch_id = Column(Integer, ForeignKey('branch.id'), nullable=False) > branch = relationship("Branch", back_populates='leaves') > > def __repr__(self): > return "<Leaf(id='{}', size='{}')>".format(self.id, self.size) > > > # Database setup > > db_conn_string = "sqlite://" > self.engine = create_engine(db_conn_string) > Base.metadata.bind = self.engine > self.DBSession = sessionmaker(bind=self.engine) > > > Goals > > Since this layer will be interacted with by many other developers in the > future I wanted to abstract away session management. This in turn means I > want to prevent potential side effects from occurring. > > As an example: In a scenario where someone modifies a Leaf object (but > doesn't want to save it yet) and then proceeds to modify a Branch object > elsewhere and saves it. The expected behaviour here is that the Branch > modification is persisted, but not the Leaf changes. So clearly having an > application wide session is not a good idea, but how do I properly separate > this out?
Some background on this problem is first at: http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it which you might have read already, and also I did a talk that tries to define what perspective the Session is coming from at: http://www.sqlalchemy.org/library.html#thesqlalchemysessionindepth So as far as goals, abstracting away session management is absolutely a great idea and all the things I've written suggest that this is the case. It doesn't however imply that the entire session is invisible, only that the points at which the session is started and ended are defined in just one place in the application. The web app case makes this easy since you link the session to the request, but other approaches including having context managers (e.g. with transaction():) or decorators. You can still have explicit scopes in an application, I just recommend hiding away as much nuts and bolts as is possible. Next part of "goals" here, you refer to an example use case. I think part of the key is just looking at the terms you used: "save" a leaf, "save" a branch. We all know the term "save" because that's what we use to refer to document management software, e.g. a word processor, graphical editing tool, or virtually anything else: we "open" our document, we "edit" it, then we "save" it. The notion that the document is separate from some place that it gets stored is intrinsic. Note that in SQLAlchemy's Session API, the word "save" is not generally used (just for the "save-update" cascade option). We instead use "add()" and "commit()". These terms are intentional and they are intended to emphasize that SQLAlchemy's ORM does not view relationally-persisted Python objects with a document-oriented model, because that's not actually how the database sees them. In your example, Tree, Leaf and Branch are highly interrelated - they each have a non-nullable foreign key to their parent table. It is therefore very awkward to say that we want to "save" one and not the other kind of object; while a "save" of a Tree without the Branch makes sense, it does not make sense to "save" the Branch without the Tree because of the dependencies. If we try to apply the reality of Tree/Leaf/Branch to the document model, it's like saying you're in a word processor, and your users would want to "save" every third paragraph of the document, but not the other two. This is not feasible or even useful. In reality, the user works with the "document" and the formatting, paragraphs and text within it are all components of that single unit. In a relational database, the single unit we deal with is the transaction - that's the thing we are "opening" and "saving", if anything, even though this doesn't fit quite so well. The transaction represents this workspace that we ask our database to create for us, within which we manipulate as much data as we'd like, then we persist it back. I wouldn't build an application that tries to address the case of the user that wants to "save" a branch but not a leaf, I would address the use case of an application where the user wants to open up a session that works with a series of interlinked objects and persists it. That is, while I don't think you should have them setting up their own sessionmaker() options or figuring out what to do when an exception is thrown and the session must be rolled back, if you don't know up front at what point these applications will want to initiate the process of working with a transaction, then that has to be exposed as an API they can use. Within the realm of GUI, where the GUI needs to access the data model in order to paint the screen, and you don't want this operation to imply a database transaction, I would argue that there should be a "view" layer that represents how the GUI is rendered, which can generate its state given a series of ORM objects. This would be scenario three, but I tend to see it more as a separation of "view" and "model" not so much "domain" and "entity", the ORM is still doing the domain/entity part for you. If you are truly going for a high level, very generically abstractable system, then you have to go there. I don't see how that leads to the conclusion that you would want to "write SQL directly" however. the SQL is still something that the library will generate for you at great savings of time and maintenance. anyway, good discussion, the GUI app model is a tricky one and I don't think there are easy answers. I would seek to build an ORM/persistence model that doesn't try to worry about the view / application layer, however. > > Attempted solutions > > 1. Detached entities with eager loading. > > Initially I simply closed each session once a context was closed and tried > to use the detached objects. > > class RepositoryContext(object): > def __enter__(self): > self.session = get_session() > return CrudRepository(self.session) # Provides simple crud > methods like add(entity), retrieve_all(entity_class), etc... > > def __exit__(self, exc_type, exc_val, exc_tb): > try: > self.session.commit() > except Exception: > self.session.rollback() > raise > finally: > self.session.close() > > > I mark all relationships I want as eager loaded relations using > `lazy='subquery'` and remove relationship definitions where that is not the > case. My new model looks like this: > > class Tree(Base): > type = Column(String(200)) > > branches = relationship("Branch", lazy="subquery") > > def __repr__(self): > return "<Tree(id='{}', type='{}', branches='{}')>".format(self.id, > self.type, self.branches) > > > class Branch(Base): > name = Column(String(200)) > > tree_id = Column(Integer, ForeignKey('tree.id'), nullable=False) > > > def __repr__(self): > return "<Branch(id='{}', name='{}')>".format(self.id, self.name) > > > class Leaf(Base): > size = Column(Integer) > > branch_id = Column(Integer, ForeignKey('branch.id'), nullable=False) > > def __repr__(self): > return "<Leaf(id='{}', size='{}')>".format(self.id, self.size) > > > So if I want to get a relationship that was previously lazy loaded I'd have > to load it within a RepositoryContext, which I could live with. > > The problem happens when I start updating entries. Because of the detached > nature I'm forced to manually refresh entities each time they are updated. > This means instead of a simple update statement I now have to perform this > merge-commit-add-refresh cycle for every entity. It technically works but > it's performing a lot more database requests than it should and I fear this > will not scale properly. > > 2. Separate commit session > > Another solution I've tried is to have two sessions, one that is application > wide and another that is newly created within a new context. The idea is to > have a "link_session" to which entities keep attached to so they can > load/refresh attributes and have a "merge_session" which perform > insert/updates/removals. Whilst seeming like a good idea at first I seem to > be having trouble actually transferring objects to the link_session after > adding them. My current solution (which is only called within a context): > > def add(self, entity): > self._merge_session.add(entity) > try: > self._merge_session.commit() > except FlushError: > self._merge_session.rollback() > raise > self._merge_session.expunge(entity) > self._link_session.add(entity) > > > 3. Complete separation of domain and entity models > > I initially went down this route but as I went on it basically looked like I > was essentially not using the ORM at all and constantly mapping entities to > domain objects and back again, at which point I might as well write the SQL > directly in the repository functions. > > --- > > Any thoughts on these or better approaches would be much appreciated. I may > have been staring at this problem for too long and lost sight of some simple > solution. > > Kind Regards, > > Kevin CYT > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.