Re: [sqlalchemy] How to maintain a tight transactional scope whilst allowing lazy loading / attribute refreshing?

Mike Bayer Tue, 06 Mar 2018 10:09:57 -0800

On Tue, Mar 6, 2018 at 5:14 AM, KCY <kevint...@hotmail.com> wrote:
> Context
>
> I'm currently designing the business and persistence layer that is going to
> be used in various frontend applications (Web and standalone). To that end
> I've been trying to reconcile ORM entities with a tight session scope but
> I'm constantly running into the same issues. For web I haven't had a big
> problem as sessions scoped to a single request work just fine, but for
> standalone applications I've been having architectural problems. To
> illustrate this, I'm using a simplified example.
>
> Setup - Model and database definition
>
> # MODELS
>
> class _Base(object):
>     @declared_attr
>     def __tablename__(cls):    # This just generates table name from the
> class name.
>         name = cls.__name__
>         table_name = name[0]
>         for c in name[1:]:
>             if c.isupper():
>                 table_name += '_'
>             table_name += c
>         return table_name.lower()
>
>     id = Column(Integer, primary_key=True)
>
>
> Base = declarative_base(cls=_Base)
>
> class Tree(Base):
>     type = Column(String(200))
>
>     branches = relationship("Branch", back_populates="tree")
>
>     def __repr__(self):
>         return "<Tree(id='{}', type='{}', branches='{}')>".format(self.id,
> self.type, self.branches)
>
>
> class Branch(Base):
>     name = Column(String(200))
>
>     tree_id = Column(Integer, ForeignKey('tree.id'), nullable=False)
>     tree = relationship("Tree", back_populates='branches')
>
>
>     leaves = relationship("Leaf", back_populates='branch')
>
>     def __repr__(self):
>         return "<Branch(id='{}', name='{}', leaves='{}')>".format(self.id,
> self.name, self.leaves)
>
>
> class Leaf(Base):
>     size = Column(Integer)
>
>     branch_id = Column(Integer, ForeignKey('branch.id'), nullable=False)
>     branch = relationship("Branch", back_populates='leaves')
>
>     def __repr__(self):
>         return "<Leaf(id='{}', size='{}')>".format(self.id, self.size)
>
>
> # Database setup
>
> db_conn_string = "sqlite://"
> self.engine = create_engine(db_conn_string)
> Base.metadata.bind = self.engine
> self.DBSession = sessionmaker(bind=self.engine)
>
>
> Goals
>
> Since this layer will be interacted with by many other developers in the
> future I wanted to abstract away session management. This in turn means I
> want to prevent potential side effects from occurring.
>
> As an example: In a scenario where someone modifies a Leaf object (but
> doesn't want to save it yet) and then proceeds to modify a Branch object
> elsewhere and saves it. The expected behaviour here is that the Branch
> modification is persisted, but not the Leaf changes. So clearly having an
> application wide session is not a good idea, but how do I properly separate
> this out?


Some background on this problem is first at:
http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it
 which you might have read already, and also I did a talk that tries
to define what perspective the Session is coming from at:
http://www.sqlalchemy.org/library.html#thesqlalchemysessionindepth

So as far as goals, abstracting away session management is absolutely
a great idea and all the things I've written suggest that this is the
case.   It doesn't however imply that the entire session is invisible,
only that the points at which the session is started and ended are
defined in just one place in the application.   The web app case makes
this easy since you link the session to the request, but other
approaches including having context managers (e.g. with
transaction():) or decorators.   You can still have explicit scopes in
an application, I just recommend hiding away as much nuts and bolts as
is possible.

Next part of "goals" here, you refer to an example use case.   I think
part of the key is just looking at the terms you used: "save" a leaf,
"save" a branch.   We all know the term "save" because that's what we
use to refer to document management software, e.g. a word processor,
graphical editing tool, or virtually anything else: we "open" our
document, we "edit" it, then we "save" it.  The notion that the
document is separate from some place that it gets stored is intrinsic.

Note that in SQLAlchemy's Session API, the word "save" is not
generally used (just for the "save-update" cascade option).   We
instead use "add()" and "commit()".   These terms are intentional and
they are intended to emphasize that SQLAlchemy's ORM does not view
relationally-persisted Python objects with a document-oriented model,
because that's not actually how the database sees them.    In your
example, Tree, Leaf and Branch are highly interrelated - they each
have a non-nullable foreign key to their parent table.   It is
therefore very awkward to say that we want to "save" one and not the
other kind of object; while a "save" of a Tree without the Branch
makes sense, it does not make sense to "save" the Branch without the
Tree because of the dependencies.

If we try to apply the reality of Tree/Leaf/Branch to the document
model, it's like saying you're in a word processor, and your users
would want to "save" every third paragraph of the document, but not
the other two.   This is not feasible or even useful.   In reality,
the user works with the "document" and the formatting, paragraphs and
text within it are all components of that single unit.

In a relational database, the single unit we deal with is the
transaction - that's the thing we are "opening" and "saving", if
anything, even though this doesn't fit quite so well.  The transaction
represents this workspace that we ask our database to create for us,
within which we manipulate as much data as we'd like, then we persist
it back.     I wouldn't build an application that tries to address the
case of the user that wants to "save" a branch but not a leaf, I would
address the use case of an application where the user wants to open up
a session that works with a series of interlinked objects and persists
it.  That is, while I don't think you should have them setting up
their own sessionmaker() options or figuring out what to do when an
exception is thrown and the session must be rolled back, if you don't
know up front at what point these applications will want to initiate
the process of working with a transaction, then that has to be exposed
as an API they can use.

Within the realm of GUI, where the GUI needs to access the data model
in order to paint the screen, and you don't want this operation to
imply a database transaction, I would argue that there should be a
"view" layer that represents how the GUI is rendered, which can
generate its state given a series of ORM objects.    This would be
scenario three, but I tend to see it more as a separation of "view"
and "model" not so much "domain" and "entity", the ORM is still doing
the domain/entity part for you.  If you are truly going for a high
level, very generically abstractable system, then you have to go
there.   I don't see how that leads to the conclusion that you would
want to "write SQL directly" however.   the SQL is still something
that the library will generate for you at great savings of time and
maintenance.

anyway, good discussion, the GUI app model is a tricky one and I don't
think there are easy answers.   I would seek to build an
ORM/persistence model that doesn't try to worry about the view /
application layer, however.

>
> Attempted solutions
>
> 1. Detached entities with eager loading.
>
> Initially I simply closed each session once a context was closed and tried
> to use the detached objects.
>
> class RepositoryContext(object):
>     def __enter__(self):
>         self.session = get_session()
>         return CrudRepository(self.session)    # Provides simple crud
> methods like add(entity), retrieve_all(entity_class), etc...
>
>     def __exit__(self, exc_type, exc_val, exc_tb):
>         try:
>             self.session.commit()
>         except Exception:
>             self.session.rollback()
>             raise
>         finally:
>             self.session.close()
>
>
> I mark all relationships I want as eager loaded relations using
> `lazy='subquery'` and remove relationship definitions where that is not the
> case. My new model looks like this:
>
> class Tree(Base):
>     type = Column(String(200))
>
>     branches = relationship("Branch", lazy="subquery")
>
>     def __repr__(self):
>         return "<Tree(id='{}', type='{}', branches='{}')>".format(self.id,
> self.type, self.branches)
>
>
> class Branch(Base):
>     name = Column(String(200))
>
>     tree_id = Column(Integer, ForeignKey('tree.id'), nullable=False)
>
>
>     def __repr__(self):
>         return "<Branch(id='{}', name='{}')>".format(self.id, self.name)
>
>
> class Leaf(Base):
>     size = Column(Integer)
>
>     branch_id = Column(Integer, ForeignKey('branch.id'), nullable=False)
>
>     def __repr__(self):
>         return "<Leaf(id='{}', size='{}')>".format(self.id, self.size)
>
>
> So if I want to get a relationship that was previously lazy loaded I'd have
> to load it within a RepositoryContext, which I could live with.
>
> The problem happens when I start updating entries. Because of the detached
> nature I'm forced to manually refresh entities each time they are updated.
> This means instead of a simple update statement I now have to perform this
> merge-commit-add-refresh cycle for every entity. It technically works but
> it's performing a lot more database requests than it should and I fear this
> will not scale properly.
>
> 2. Separate commit session
>
> Another solution I've tried is to have two sessions, one that is application
> wide and another that is newly created within a new context. The idea is to
> have a "link_session" to which entities keep attached to so they can
> load/refresh attributes and have a "merge_session" which perform
> insert/updates/removals. Whilst seeming like a good idea at first I seem to
> be having trouble actually transferring objects to the link_session after
> adding them. My current solution (which is only called within a context):
>
> def add(self, entity):
>     self._merge_session.add(entity)
>     try:
>         self._merge_session.commit()
>     except FlushError:
>         self._merge_session.rollback()
>         raise
>     self._merge_session.expunge(entity)
>     self._link_session.add(entity)
>
>
> 3. Complete separation of domain and entity models
>
> I initially went down this route but as I went on it basically looked like I
> was essentially not using the ORM at all and constantly mapping entities to
> domain objects and back again, at which point I might as well write the SQL
> directly in the repository functions.
>
> ---
>
> Any thoughts on these or better approaches would be much appreciated. I may
> have been staring at this problem for too long and lost sight of some simple
> solution.
>
> Kind Regards,
>
> Kevin CYT
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] How to maintain a tight transactional scope whilst allowing lazy loading / attribute refreshing?

Reply via email to