Thanks Michael, I really appreciate the response. This broke something of a
mental log-jam I was having.
The original use I had in mind for this was for my web framework. I was
using it to tag "Session" objects for caching and then attach user specific
data and frequently accessed data to the session to minimize database
requests for common data across all a users requests. The goal was to have
the caching be fairly transparent to the application so that I wouldn’t
have to track invalidations manually. This object cache pattern has been in
the framework for a while but I haven't had lots of time to really make it
work properly (kind of a side project).
I'd looked at using the cascade but I was fixated (for some reason) on
checking only the child items with changes (session.dirty/new/deleted) in
the before_flush and after_commit hooks. I was looking for some way to
annotate children with relationships with the top level parent container so
that on a child modification I could operate on the container (A.B.C, C is
modified by someone, mark A for invalidation) or to somehow walk the
relationship backwards from child to parent. That last bit was where I got
stuck and abandoned this approach. I had some implementations where I was
trying to pass parent references down the chain but it got complicated when
multiple ‘containers’ were referencing the same child and I never got the
passing to work properly. (Any tips on the proper event to listen for for
attribute attach and append?)
Your comment gave me a forehead slap moment. I realized I could just check
*all* of the objects tagged as cache container objects in the current
session and cascade down from each and mark the container for write through
or invalidation if any related objects were modified. The pattern I'm using
only has a few of these 'container' cached objects per web request (like
session above) so checking them all isn't crazy-painful. I did some quick
timeit experiments and it looks like the overhead of these extra checks is
pretty negligible and it will potentially save me a lot of complexity and
db lookups in my current application.
Another optimization I'm thinking of is to actually record the related
children object list on fetch from cache and then compare to the new
related object list on flush. That way new loaded relationships would get
written through to the cache to be re-used on subsequent requests as well.
The current implementation can’t see when an object has been lazy-loaded
(but not modified) on the container. Any thoughts on this approach?
Just for reference, here's a (slightly modified) version of how I'm doing
the invalidation / modification check now. In my quick and dirty testing it
seems to cover all of the stale data cases I was encountering before.
The only other pieces to make this work is a cache_key attribute and a
__cached__ tag on objects that will be cached and act as containers. I then
have a cache.get method that I use to request these special containers. The
get function checks memcached first, merges on hit, and requests from the
database on miss.
session = cache.get(Session, session_id="XXXXX")
The item gets written through to the cache on successful commit if it's new
or modified (and now, has related children that are new or modified).
Putting it in after_commit also guarantees that invalid data that's rolled
back doesn't accidentally get written to the cache.
from sqlalchemy import event
from sqlalchemy.orm.attributes import instance_state
def is_modified(obj):
'''Check if SQLAlchemy believes this instance is modified.'''
return instance_state(obj).modified
def child_relationships(obj):
'''Get instances with child relationships to this obj'''
state = instance_state(obj)
mapper = obj.__mapper__
for related_obj, mapper, state, data in
mapper.cascade_iterator("save-update",
state):
yield related_obj
# Caching / invalidation event handlers and functions.
def cached_objects(iter):
'''Return objects from a session/iterator that are tagged for caching'''
for obj in iter:
# check if the sqlalchemy object is tagged for caching
if hasattr(obj, '__cached__'):
yield obj
def cached_objects_with_updates(iter):
'''Return objects from a session/iterator that are tagged for caching
and
that need an update.'''
for obj in iter:
if getattr(obj, "__needs_update__", False):
yield obj
def check_needs_update(session, flush_context, instances):
'''Check for objects in the SQLAlchemy session that are modified and
tag them for cache update.'''
for obj in cached_objects(session):
if obj in (session.new or session.dirty):
obj.__needs_update__ = True
for related_obj in child_relationships(obj):
if (related_obj in (session.new or session.dirty or
session.deleted)
and
is_modified(related_obj)):
obj.__needs_update__ = True
def update_cached_objects(session):
'''Write out (or invalidate) cache entries for modified entitites'''
for obj in cached_objects_with_updates(session):
obj.__needs_update__ = False
cache.write(obj.cache_key, obj)
event.listen(models.session, "before_flush", check_needs_update)
event.listen(models.session, "after_commit", update_cached_objects)
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To view this discussion on the web visit
https://groups.google.com/d/msg/sqlalchemy/-/W2InlwgOpj0J.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en.