Re: [sqlalchemy] How to maintain a tight transactional scope whilst allowing lazy loading / attribute refreshing?

2018-03-06 Thread KCY
On Tuesday, 6 March 2018 21:12:01 UTC+1, Mike Bayer wrote:
>
> On Tue, Mar 6, 2018 at 2:52 PM, KCY  
>> wrote: 
>> > First off thank you for the quick reply. I have seen those resources 
>> you 
>> > linked a few days ago and it guided me partially to my current ideas. 
>> The 
>> > RepositoryContext class is essentially the contextmanager example with 
>> some 
>> > extra helper methods. 
>> > 
>> > I think in trying to keep my example concise I left out too much detail 
>> to 
>> > illustrate the potential problem I'm trying to avoid. To expand upon 
>> it, I 
>> > can imagine a scenario where there are 2 branches, each with their set 
>> of 
>> > leaves. In a scenario with some tabbed UI where both branches are 
>> "open" 
>> > some editing happens on both but that the intention upon saving is to 
>> only 
>> > save the current tab's branch but not commit any of the other changes. 
>>
>> OK so within the scope of "editing" that's where I'd put transactional 
>> scope.E.g. transaction per tab.  But I wouldn't leave these 
>> transactions open for an event-driven GUI waiting for input, only when 
>> it's time to manipulate the state of the database. 
>>
>
That seems like a good idea, I'll have to experiment a bit to see what we 
end up with.

I'll post an update once I have a more fleshed out solution for anyone else 
who is interested.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] How to maintain a tight transactional scope whilst allowing lazy loading / attribute refreshing?

2018-03-06 Thread Mike Bayer
On Tue, Mar 6, 2018 at 2:52 PM, KCY  wrote:
> First off thank you for the quick reply. I have seen those resources you
> linked a few days ago and it guided me partially to my current ideas. The
> RepositoryContext class is essentially the contextmanager example with some
> extra helper methods.
>
> I think in trying to keep my example concise I left out too much detail to
> illustrate the potential problem I'm trying to avoid. To expand upon it, I
> can imagine a scenario where there are 2 branches, each with their set of
> leaves. In a scenario with some tabbed UI where both branches are "open"
> some editing happens on both but that the intention upon saving is to only
> save the current tab's branch but not commit any of the other changes.

OK so within the scope of "editing" that's where I'd put transactional
scope.E.g. transaction per tab.  But I wouldn't leave these
transactions open for an event-driven GUI waiting for input, only when
it's time to manipulate the state of the database.

>
> Based on your reply I do think that perhaps I'm trying to cover all my use
> cases too soon. Given that I do not know all the potential requirements it
> is not feasible to hide away the session completely. To continue on the
> expanded example; this is much easier to solve within a controller /
> presenter type layer that tracks which tab corresponds to which entity and
> then propagating changes to the corresponding entities based on the
> command(s) given.

yeah i was kind of suggesting boring old MVC :)   (like the real, pre-web kind).


>
> For now I think I will take a step back from the persistence layer and flesh
> out the remaining parts of the application. Once I get around to integrating
> it with a proper GUI I will finalize the design.
>
> Thanks again for the quick and detailed reply.
>
>
> On Tuesday, 6 March 2018 19:09:18 UTC+1, Mike Bayer wrote:
>>
>>
>> Some background on this problem is first at:
>>
>> http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it
>>  which you might have read already, and also I did a talk that tries
>> to define what perspective the Session is coming from at:
>> http://www.sqlalchemy.org/library.html#thesqlalchemysessionindepth
>>
>> So as far as goals, abstracting away session management is absolutely
>> a great idea and all the things I've written suggest that this is the
>> case.   It doesn't however imply that the entire session is invisible,
>> only that the points at which the session is started and ended are
>> defined in just one place in the application.   The web app case makes
>> this easy since you link the session to the request, but other
>> approaches including having context managers (e.g. with
>> transaction():) or decorators.   You can still have explicit scopes in
>> an application, I just recommend hiding away as much nuts and bolts as
>> is possible.
>>
>> Next part of "goals" here, you refer to an example use case.   I think
>> part of the key is just looking at the terms you used: "save" a leaf,
>> "save" a branch.   We all know the term "save" because that's what we
>> use to refer to document management software, e.g. a word processor,
>> graphical editing tool, or virtually anything else: we "open" our
>> document, we "edit" it, then we "save" it.  The notion that the
>> document is separate from some place that it gets stored is intrinsic.
>>
>> Note that in SQLAlchemy's Session API, the word "save" is not
>> generally used (just for the "save-update" cascade option).   We
>> instead use "add()" and "commit()".   These terms are intentional and
>> they are intended to emphasize that SQLAlchemy's ORM does not view
>> relationally-persisted Python objects with a document-oriented model,
>> because that's not actually how the database sees them.In your
>> example, Tree, Leaf and Branch are highly interrelated - they each
>> have a non-nullable foreign key to their parent table.   It is
>> therefore very awkward to say that we want to "save" one and not the
>> other kind of object; while a "save" of a Tree without the Branch
>> makes sense, it does not make sense to "save" the Branch without the
>> Tree because of the dependencies.
>>
>> If we try to apply the reality of Tree/Leaf/Branch to the document
>> model, it's like saying you're in a word processor, and your users
>> would want to "save" every third paragraph of the document, but not
>> the other two.   This is not feasible or even useful.   In reality,
>> the user works with the "document" and the formatting, paragraphs and
>> text within it are all components of that single unit.
>>
>> In a relational database, the single unit we deal with is the
>> transaction - that's the thing we are "opening" and "saving", if
>> anything, even though this doesn't fit quite so well.  The transaction
>> represents this workspace that we ask our database to create for us,
>> within which we 

Re: [sqlalchemy] How to maintain a tight transactional scope whilst allowing lazy loading / attribute refreshing?

2018-03-06 Thread KCY
First off thank you for the quick reply. I have seen those resources you 
linked a few days ago and it guided me partially to my current ideas. The 
RepositoryContext class is essentially the contextmanager example with some 
extra helper methods.

I think in trying to keep my example concise I left out too much detail to 
illustrate the potential problem I'm trying to avoid. To expand upon it, I 
can imagine a scenario where there are 2 branches, each with their set of 
leaves. In a scenario with some tabbed UI where both branches are "open" 
some editing happens on both but that the intention upon saving is to only 
save the current tab's branch but not commit any of the other changes.

Based on your reply I do think that perhaps I'm trying to cover all my use 
cases too soon. Given that I do not know all the potential requirements it 
is not feasible to hide away the session completely. To continue on the 
expanded example; this is much easier to solve within a controller / 
presenter type layer that tracks which tab corresponds to which entity and 
then propagating changes to the corresponding entities based on the 
command(s) given.

For now I think I will take a step back from the persistence layer and 
flesh out the remaining parts of the application. Once I get around to 
integrating it with a proper GUI I will finalize the design.

Thanks again for the quick and detailed reply.

On Tuesday, 6 March 2018 19:09:18 UTC+1, Mike Bayer wrote:

>
> Some background on this problem is first at: 
>
> http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it
>  
>  which you might have read already, and also I did a talk that tries 
> to define what perspective the Session is coming from at: 
> http://www.sqlalchemy.org/library.html#thesqlalchemysessionindepth 
>
> So as far as goals, abstracting away session management is absolutely 
> a great idea and all the things I've written suggest that this is the 
> case.   It doesn't however imply that the entire session is invisible, 
> only that the points at which the session is started and ended are 
> defined in just one place in the application.   The web app case makes 
> this easy since you link the session to the request, but other 
> approaches including having context managers (e.g. with 
> transaction():) or decorators.   You can still have explicit scopes in 
> an application, I just recommend hiding away as much nuts and bolts as 
> is possible. 
>
> Next part of "goals" here, you refer to an example use case.   I think 
> part of the key is just looking at the terms you used: "save" a leaf, 
> "save" a branch.   We all know the term "save" because that's what we 
> use to refer to document management software, e.g. a word processor, 
> graphical editing tool, or virtually anything else: we "open" our 
> document, we "edit" it, then we "save" it.  The notion that the 
> document is separate from some place that it gets stored is intrinsic. 
>
> Note that in SQLAlchemy's Session API, the word "save" is not 
> generally used (just for the "save-update" cascade option).   We 
> instead use "add()" and "commit()".   These terms are intentional and 
> they are intended to emphasize that SQLAlchemy's ORM does not view 
> relationally-persisted Python objects with a document-oriented model, 
> because that's not actually how the database sees them.In your 
> example, Tree, Leaf and Branch are highly interrelated - they each 
> have a non-nullable foreign key to their parent table.   It is 
> therefore very awkward to say that we want to "save" one and not the 
> other kind of object; while a "save" of a Tree without the Branch 
> makes sense, it does not make sense to "save" the Branch without the 
> Tree because of the dependencies. 
>
> If we try to apply the reality of Tree/Leaf/Branch to the document 
> model, it's like saying you're in a word processor, and your users 
> would want to "save" every third paragraph of the document, but not 
> the other two.   This is not feasible or even useful.   In reality, 
> the user works with the "document" and the formatting, paragraphs and 
> text within it are all components of that single unit. 
>
> In a relational database, the single unit we deal with is the 
> transaction - that's the thing we are "opening" and "saving", if 
> anything, even though this doesn't fit quite so well.  The transaction 
> represents this workspace that we ask our database to create for us, 
> within which we manipulate as much data as we'd like, then we persist 
> it back. I wouldn't build an application that tries to address the 
> case of the user that wants to "save" a branch but not a leaf, I would 
> address the use case of an application where the user wants to open up 
> a session that works with a series of interlinked objects and persists 
> it.  That is, while I don't think you should have them setting up 
> their own sessionmaker() 

Re: [sqlalchemy] How to maintain a tight transactional scope whilst allowing lazy loading / attribute refreshing?

2018-03-06 Thread Mike Bayer
On Tue, Mar 6, 2018 at 5:14 AM, KCY  wrote:
> Context
>
> I'm currently designing the business and persistence layer that is going to
> be used in various frontend applications (Web and standalone). To that end
> I've been trying to reconcile ORM entities with a tight session scope but
> I'm constantly running into the same issues. For web I haven't had a big
> problem as sessions scoped to a single request work just fine, but for
> standalone applications I've been having architectural problems. To
> illustrate this, I'm using a simplified example.
>
> Setup - Model and database definition
>
> # MODELS
>
> class _Base(object):
> @declared_attr
> def __tablename__(cls):# This just generates table name from the
> class name.
> name = cls.__name__
> table_name = name[0]
> for c in name[1:]:
> if c.isupper():
> table_name += '_'
> table_name += c
> return table_name.lower()
>
> id = Column(Integer, primary_key=True)
>
>
> Base = declarative_base(cls=_Base)
>
> class Tree(Base):
> type = Column(String(200))
>
> branches = relationship("Branch", back_populates="tree")
>
> def __repr__(self):
> return "".format(self.id,
> self.type, self.branches)
>
>
> class Branch(Base):
> name = Column(String(200))
>
> tree_id = Column(Integer, ForeignKey('tree.id'), nullable=False)
> tree = relationship("Tree", back_populates='branches')
>
>
> leaves = relationship("Leaf", back_populates='branch')
>
> def __repr__(self):
> return "".format(self.id,
> self.name, self.leaves)
>
>
> class Leaf(Base):
> size = Column(Integer)
>
> branch_id = Column(Integer, ForeignKey('branch.id'), nullable=False)
> branch = relationship("Branch", back_populates='leaves')
>
> def __repr__(self):
> return "".format(self.id, self.size)
>
>
> # Database setup
>
> db_conn_string = "sqlite://"
> self.engine = create_engine(db_conn_string)
> Base.metadata.bind = self.engine
> self.DBSession = sessionmaker(bind=self.engine)
>
>
> Goals
>
> Since this layer will be interacted with by many other developers in the
> future I wanted to abstract away session management. This in turn means I
> want to prevent potential side effects from occurring.
>
> As an example: In a scenario where someone modifies a Leaf object (but
> doesn't want to save it yet) and then proceeds to modify a Branch object
> elsewhere and saves it. The expected behaviour here is that the Branch
> modification is persisted, but not the Leaf changes. So clearly having an
> application wide session is not a good idea, but how do I properly separate
> this out?

Some background on this problem is first at:
http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it
 which you might have read already, and also I did a talk that tries
to define what perspective the Session is coming from at:
http://www.sqlalchemy.org/library.html#thesqlalchemysessionindepth

So as far as goals, abstracting away session management is absolutely
a great idea and all the things I've written suggest that this is the
case.   It doesn't however imply that the entire session is invisible,
only that the points at which the session is started and ended are
defined in just one place in the application.   The web app case makes
this easy since you link the session to the request, but other
approaches including having context managers (e.g. with
transaction():) or decorators.   You can still have explicit scopes in
an application, I just recommend hiding away as much nuts and bolts as
is possible.

Next part of "goals" here, you refer to an example use case.   I think
part of the key is just looking at the terms you used: "save" a leaf,
"save" a branch.   We all know the term "save" because that's what we
use to refer to document management software, e.g. a word processor,
graphical editing tool, or virtually anything else: we "open" our
document, we "edit" it, then we "save" it.  The notion that the
document is separate from some place that it gets stored is intrinsic.

Note that in SQLAlchemy's Session API, the word "save" is not
generally used (just for the "save-update" cascade option).   We
instead use "add()" and "commit()".   These terms are intentional and
they are intended to emphasize that SQLAlchemy's ORM does not view
relationally-persisted Python objects with a document-oriented model,
because that's not actually how the database sees them.In your
example, Tree, Leaf and Branch are highly interrelated - they each
have a non-nullable foreign key to their parent table.   It is
therefore very awkward to say that we want to "save" one and not the
other kind of object; while a "save" of a Tree without the Branch
makes sense, it does not make sense to 

[sqlalchemy] How to maintain a tight transactional scope whilst allowing lazy loading / attribute refreshing?

2018-03-06 Thread KCY
*Context*

I'm currently designing the business and persistence layer that is going to 
be used in various frontend applications (Web and standalone). To that end 
I've been trying to reconcile ORM entities with a tight session scope but 
I'm constantly running into the same issues. For web I haven't had a big 
problem as sessions scoped to a single request work just fine, but for 
standalone applications I've been having architectural problems. To 
illustrate this, I'm using a simplified example.

*Setup - Model and database definition*

*# MODELS*

class _Base(object):
@declared_attr
def __tablename__(cls):# This just generates table name from the class 
name.
name = cls.__name__
table_name = name[0]
for c in name[1:]:
if c.isupper():
table_name += '_'
table_name += c
return table_name.lower()

id = Column(Integer, primary_key=True)


Base = declarative_base(cls=_Base)

class Tree(Base):
type = Column(String(200))

branches = relationship("Branch", back_populates="tree")

def __repr__(self):
return "".format(self.id, 
self.type, self.branches)


class Branch(Base):
name = Column(String(200))

tree_id = Column(Integer, ForeignKey('tree.id'), nullable=False)
tree = relationship("Tree", back_populates='branches')


leaves = relationship("Leaf", back_populates='branch')

def __repr__(self):
return "".format(self.id, self.name, 
self.leaves)


class Leaf(Base):
size = Column(Integer)

branch_id = Column(Integer, ForeignKey('branch.id'), nullable=False)
branch = relationship("Branch", back_populates='leaves')

def __repr__(self):
return "".format(self.id, self.size)


# Database setup

db_conn_string = "sqlite://"
self.engine = create_engine(db_conn_string)
Base.metadata.bind = self.engine
self.DBSession = sessionmaker(bind=self.engine)


*Goals*

Since this layer will be interacted with by many other developers in the 
future I wanted to abstract away session management. This in turn means I 
want to prevent potential side effects from occurring.

As an example: In a scenario where someone modifies a Leaf object (but 
doesn't want to save it yet) and then proceeds to modify a Branch object 
elsewhere and saves it. The expected behaviour here is that the Branch 
modification is persisted, but not the Leaf changes. So clearly having an 
application wide session is not a good idea, but how do I properly separate 
this out?

*Attempted solutions*

*1. Detached entities with eager loading.*

Initially I simply closed each session once a context was closed and tried 
to use the detached objects.

class RepositoryContext(object):
def __enter__(self):
self.session = get_session()
return CrudRepository(self.session)# Provides simple crud methods 
like add(entity), retrieve_all(entity_class), etc...

def __exit__(self, exc_type, exc_val, exc_tb):
try:
self.session.commit()
except Exception:
self.session.rollback()
raise
finally:
self.session.close()


I mark all relationships I want as eager loaded relations using 
`lazy='subquery'` and remove relationship definitions where that is not the 
case. My new model looks like this:

class Tree(Base):
type = Column(String(200))

branches = relationship("Branch", lazy="subquery")

def __repr__(self):
return "".format(self.id, 
self.type, self.branches)


class Branch(Base):
name = Column(String(200))

tree_id = Column(Integer, ForeignKey('tree.id'), nullable=False)


def __repr__(self):
return "".format(self.id, self.name)


class Leaf(Base):
size = Column(Integer)

branch_id = Column(Integer, ForeignKey('branch.id'), nullable=False)

def __repr__(self):
return "".format(self.id, self.size)


So if I want to get a relationship that was previously lazy loaded I'd have 
to load it within a RepositoryContext, which I could live with.

The problem happens when I start updating entries. Because of the detached 
nature I'm forced to manually refresh entities each time they are updated. 
This means instead of a simple update statement I now have to perform this 
*merge-commit-add-refresh* cycle for every entity. It technically works but 
it's performing a lot more database requests than it should and I fear this 
will not scale properly.

*2. Separate commit session*

Another solution I've tried is to have two sessions, one that is 
application wide and another that is newly created within a new context. 
The idea is to have a "link_session" to which entities keep attached to so 
they can load/refresh attributes and have a "merge_session" which perform 
insert/updates/removals. Whilst seeming like a good idea at first I seem to 
be having trouble