Re: Need help on caching, background jobs & manual ORM cache refresh (Pyramid project)

Michael Bayer Thu, 07 Jun 2012 09:57:01 -0700

For caching, I'd use dogpile.cache:  https://bitbucket.org/zzzeek/dogpile.cache/


Which is specifically the replacement for Beaker caching.   It is much simpler 
and more performant.

SQLAlchemy 0.8 will convert the "beaker caching" examples to use dogpile 
instead.    Attached is a tutorial script from a recent tutorial i gave which 
illustrates a typical dogpile/SQLAlchemy caching configuration, in the spirit 
of the Beaker caching example.





On Jun 5, 2012, at 3:36 PM, Learner wrote:

> Thanks Jason. I will take your suggestion.
> But your hint about beaker caching is helpful.
> 
> cheers
> -Bkumar
> 
> On Jun 5, 4:22 pm, Jason <[email protected]> wrote:
>> On Tuesday, June 5, 2012 7:49:10 AM UTC-4, Learner wrote:
>> 
>>> Hello Pyramid gurus,
>> 
>>> I have been searching for quick tutorials on caching, background jobs
>>> & ORM related topics. I found quite a few resources which seem to be
>>> very informative. Since I am new to both Python & Pyramid, I thought I
>>> will seek experienced people opinion, before I go ahead and use
>>> anything I found on web. Any help is very much appreciated.
>> 
>>> 1. Caching:
>>>     The simple use case is:- I want to show top 10 or 20 articles on
>>> my wiki application. Before I render the data I would like to cache
>>> the db result upon first query execution and cache it. Cache to
>>> refresh automatically after every 1 hour or so.
>>> As far as caching is concerned you will be better off caching the result
>> 
>> of your view. Beaker cache has decorators for caching individual
>> functions/methods for a specified period of time (look for cache_region
>> decorator) this way not only will the database results be cached, but also
>> the processing required to turn them into the template values. I don't know
>> if there is a way to also cache the rendered template with Pyramid.
>> 
>>> 2. Background Jobs: I am using SQLAlchemy in my application. All the data
>>> needed for the application comes from XML/CSV files. Is there any way in
>>> Pyramid I can create a background job and schedule it to run every 30
>>> minutes or so?. Job will look at one particular folder everytime it is run,
>>> and if there are any xml/csv files job will pick it up and process them.
>>> Since this is simple ETL job, SQLAlchemy is not aware of the DB changes. So
>>> does this confuse any of the ORM caching mechanism and show the dirty data?
>>> If so, how would I be able to notify ORM to rebuild its caching? Thanks for
>>> your time.
>> 
>> Are the XML files parsed and then the data is inserted into a database that
>> Pyramid uses? Perhaps a cron job would be better suited to that?  If you
>> are using caching then the data will not be refreshed in Pyramid until the
>> cache refreshes. If you are using beaker you can force the cache to refresh
>> on the next hit.
>> 
>> Are you sure you need all this caching though? It seems unnecessarily
>> complicated. Pyramid is very fast, SQLAlchemy is very fast, your database
>> will probably be caching the query plans as well so it's going to be very
>> fast. I would recommend building your application with no caching, and then
>> adding it later if it is needed. That way you can worry about getting the
>> loaded data displaying correctly (especially since you're data setup is a
>> little more complex) before having to figure out a caching system.
>> 
>> -- Jason
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "pylons-discuss" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/pylons-discuss?hl=en.
>

### slide::

#### Transparent Caching ####

# Illustrate using MapperOption objects to send caching directives
# to a custom Query subclass.
#

### slide::

# dogpile.cache is a new caching system which replaces Beaker.
#
# https://bitbucket.org/zzzeek/dogpile.cache
#
# Create a dogpile "cache region"


from dogpile.cache.region import make_region
regions = {
    "default":make_region().configure(
        'dogpile.cache.memory'
        )
  }

### slide:: -*- no_clear -*-

regions['default'].set("some key", "some value")

regions['default'].get("some key")

### slide:: -*- no_clear -*-
regions["default"].backend._cache

### slide::
# the dogpile (and Beaker) model allows you to pass
# a callable that generates a value

def generate_a_value():
    print "generating !"
    return "some value"

regions["default"].get_or_create("some other key", generate_a_value)

### slide:: -*- no_clear -*-

regions["default"].get_or_create("some other key", generate_a_value)

### slide:: -*- no_clear -*-

regions["default"].delete("some other key")
regions["default"].get_or_create("some other key", generate_a_value)

### slide::

# A Query which accesses a Dogpile cache.
# The parameters of the cache are derived partially from the
# structure of the query.

from sqlalchemy.orm.query import Query

class CachingQuery(Query):
    def __iter__(self):
        """override __iter__ to change where data comes from"""
        if hasattr(self, '_cache_region'):
            dogpile_region, cache_key = self._get_cache_plus_key()
            cached_value = dogpile_region.get_or_create(
                                        cache_key, 
                                        lambda: list(Query.__iter__(self))
                                    )
            return self.merge_result(cached_value, load=False)
        else:
            return super(CachingQuery, self).__iter__()

    def _get_cache_plus_key(self):
        """Return a cache region plus key."""
        return \
            regions[self._cache_region.region],\
            _key_from_query(self)

    def invalidate(self):
        """Invalidate the cache value represented by this Query."""
        dogpile_region, cache_key = self._get_cache_plus_key()
        dogpile_region.delete(cache_key)

### slide::

# the "key" for our cache will be based on the structure
# of the Query.   We define a helper that will give us
# all the "bind" values from a particular Query object.

from sqlalchemy.sql import visitors
import md5

def _key_from_query(query):
    """Given a Query, extract all bind parameter values from
    its structure."""

    v = []
    def visit_bindparam(bind):

        if bind.key in query._params:
            value = query._params[bind.key]
        elif bind.callable:
            value = bind.callable()
        else:
            value = bind.value

        v.append(unicode(value))

    stmt = query.statement
    visitors.traverse(stmt, {}, {'bindparam':visit_bindparam})
    return " ".join(
                [md5.md5(unicode(stmt)).hexdigest()] + v
            )

### slide::

# a brief example illustrating _params_from_query

from sqlalchemy.orm import Session
from sqlalchemy.sql import table, column

t1 = table('t1', column('a'), column('b'))

q = Session().query(t1).filter(t1.c.a=='test').filter(t1.c.b=='bar')

print q.statement

_key_from_query(q)

### slide::

# We now define a MapperOption that will place attributes 
# onto the query.

from sqlalchemy.orm.interfaces import MapperOption

class FromCache(MapperOption):
    """Specifies that a Query should load results from a cache."""

    propagate_to_loaders = False

    def __init__(self, region="default"):
        self.region = region

    def process_query(self, query):
        query._cache_region = self


### slide::

# Set up a session and base with CachingQuery

from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base

engine = create_engine('sqlite://')
Session = scoped_session(sessionmaker(engine, query_cls=CachingQuery))

Base = declarative_base()

### slide:: -*-no_exec-*-

# A model and some test data.

from sqlalchemy import Integer, String, Column, ForeignKey
from sqlalchemy.orm import relation

class Widget(Base):
    __tablename__ = 'widget'
    id = Column(Integer, primary_key=True)
    data = Column(String)

class SubWidget(Base):
    __tablename__ = 'subwidget'
    id = Column(Integer, primary_key=True)
    data = Column(String)
    widget_id=Column(Integer, ForeignKey('widget.id'))
    widget = relation(Widget, backref="subwidgets")

Base.metadata.create_all(engine)

Session.add_all([
    Widget(data='w1', subwidgets=[SubWidget(data='s1'), 
                                SubWidget(data='s2')]),
    Widget(data='w2', subwidgets=[SubWidget(data='s3')])
])
Session.commit()

### slide:: -*-no_exec-*-

# Load SubWidgets, place the results in cache

Session.query(SubWidget).\
                options(FromCache()).\
                join(SubWidget.widget).\
                filter(Widget.data=='w1').\
                all()

### slide:: -*- no_clear -*-
regions["default"].backend._cache

### slide:: -*-no_exec-*-

# On a second run, the results come from the cache.

Session.query(SubWidget).\
                options(FromCache()).\
                join(SubWidget.widget).\
                filter(Widget.data=='w1').\
                all()


### slide:: -*-no_exec-*-

# a new Query with the same form will produce the same
# cache key.   Using a new Query object we can 
# call invalidate() to remove the previously cached
# value.

q = Session.query(SubWidget).\
            options(FromCache()).\
            join(SubWidget.widget).\
            filter(Widget.data=='w1')

q.invalidate()
q.all()

### slide::

# This is a variant on the FromCache option, which will affect
# specifically the query that is invoked within a lazy load.

class RelationshipCache(MapperOption):
    """Specifies that a Query as called within a "lazy load" 
       should load results from a cache."""

    propagate_to_loaders = True

    def __init__(self, attribute, region="default"):
        self.region = region
        self.cls_ = attribute.property.parent.class_
        self.key = attribute.property.key

    def process_query_conditionally(self, query):
        if query._current_path:
            mapper, key = query._current_path[-2:]
            if issubclass(mapper.class_, self.cls_) and \
                key == self.key:
                query._cache_region = self

### slide:: -*-no_exec-*-

# To illustrate, we'll load a SubWidget using our option,
# then load it's "widget".  The lazyload for "widget" is 
# cached.   The recipe requires that the object isn't already
# in the session, so start clean.  Or you will go crazy.

Session.remove()

s1 = Session.query(SubWidget).\
            filter(SubWidget.data=='s1').\
            options(RelationshipCache(SubWidget.widget)).\
            one()
s1.widget

### slide:: -*-no_exec-*-

# Now Widget(data="w1") is cached by id.  A brand new session will
# lazyload, but the result pulls from cache.

Session.remove()

s2 = Session.query(SubWidget).\
            filter(SubWidget.data=='s2').\
            options(RelationshipCache(SubWidget.widget)).\
            one()
s2.widget

### slide::

-- 
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en.

Re: Need help on caching, background jobs & manual ORM cache refresh (Pyramid project)

Reply via email to