Re: [Sqlalchemy-users] ProxyList proposal

Michael Bayer Sat, 15 Jul 2006 10:57:51 -0700

it seems there are two proposals here:

1. being able to stick a cache in between the identity map check and an actual SQL get operation

2. supporting proxy classes for a select() operation. by proxy classes i mean what they do in hibernate here:

http://www.hibernate.org/hib_docs/v3/reference/en/html/performance.html#performance-fetching-proxies

in other words, you dont really need a custom list class to do this, you just need to override the method used by Mapper to instantiate an object from a result set to be a "proxy" object that contains only identifying information; then when you do anything with it, it loads all of its properties, or otherwise changes its class to be the desired object. im pretty sure python will allow whatever scheme we can think of to be possible for this kind of thing.

i say this because it seems like you are not trying to optimize the fact that we select identifying rows from the database; only that we dont actually load the entire row until its needed. whether or not this works for you depends on what specifically youre looking to optimize; if a relation contains 50 million objects, youre still loading 50 million rows, just a lot fewer columns from each row, for example.

you could sort of get a lot of this behavior just by using "deferred" columns but i realize that doesnt interact with the caching layer the way youre looking for.

my instinct is that a MapperExtension that overrides create_instance() to provide this proxying object, which then knows how to check the cache before calling session._get(), could do the whole thing. the entire feature could be presented as a relatively short module in sqlalchemy.ext that only involves specifying this particular MapperExtension. this would also eliminate any problems with appending or interaction with other objects inside of lists which contain these objects, and also can be used in all sorts of scenarios, eager loading, lazy loading, standalone select() operations, etc. the extension might also need a little help in preventing all the extra columns from being added to the query, which would amount to a small hook placed on line 363 of query.py (where oddly enough i commented a while back that a plugin should be allowed there).

to enable it only for a particular relation() on a class, it would probably involve specifying an explicit non-primary mapper to that relation() which uses this MapperExtension, i.e.:

proxymapper = class_mapper(Group).options(extension(ProxyExtension()))

mapper(SomeClass, sometable, property = {

'groups': relation(proxymapper)

}

)

its possible i might have to get the extension() option working since I havent tested that in awhile.

so I think the concept of this is desireable, id just like to do it in the most simple and non-intrusive way possible. it seems there might be some value in inserting a mapper extension call around line 273 of query.py to allow custom cache implementations to return cached instances, although this particular feature wouldnt need that since the proxy object can fetch its data from whatever source it wants.

On Jul 15, 2006, at 4:04 AM, Michael Carter wrote:

Hello,

Let me apologize up front about the length of this post. I am doing my best to describe the behaviour I want so there won't be more confusion than there needs to be.

While working on my caching strategies I implemented something I call a ProxyList. A ProxyList for now is a Many-One Join. Depending on its initialization parameters, it can return the proper select statement when asked. The job of the ProxyList is to store a list of ids (and only ids) that represent the objects in the list. The caching then occurs at two levels. When you access an item in the list, first the identity map is checked, then the cache, and finally a session.get is issued. Secondly, results of the select (as a list of ids) is cached. So when the list is first accessed (via a property on the parent object), first the cache is checked for the stored relation, and then on a miss the actual select is issued (and then cached for future use.) One final caveat is that when this ProxyList is created on the One class, the Many class has some hidden attribuets adjusted so that whenever the column corresponding to that join is changed on a Many-object, the cached select of the previous One object and the cached select of the new One object is expired.

Let me try to make this clearer. My current API has the relations being declared somewhat like ActiveMapper (and likewise being processed via a meta-class). Also, the below example ignores certain implementation details, but the point should be clear.

# tables
users = Table('users', metadata,
    Column('id', Integer, primary_key=True),
    Column('name', String))

groups = Table('groups', metadata,
    Column('id', Integer, primary_key=True),
    Column('name', String),
    Column('owner_id', Integer, ForeignKey('users.id'))

class User:
    # Creates a ProxyList
    groups = ManyJoin('Group', 'owner_id')

class Group:
    def __repr__(self): return "<Group %s>" % self.id and self.id or '-'

User.mapper = mapper(User, users)
Group.mapper = mapper(Group, groups)

# Create some initial data
>>> session.save(User()))
>>> session.save(User()))
>>> session.save(Group())
>>> session.save(Group())
>>> session.flush()
>>> u1, u2 = User.get(1), User.get(2)
>>> u1.name = 'Michael'
>>> u2.name = 'Jon'
>>> g1, g2 = Group.get(1), Group.get(2)
>>> g1.owner_id = 1
>>> g2.owner_id = 1
>>> session.flush(), session.clear()

# So now, here is how the ProxyList looks
>>> u1 = User.get(1)
>>> u1.groups
>>> [ <Group [Proxy] 1>, <Group [Proxy] 2> ]
# Note, Group 1 and Group 2 aren't actually in the identity map
# The next statement does actually loads a Group object
>>> u1.groups[0]
<Group 1>
>>> u1.groups[1]
<Group 2>

# Now lets change the owner_id on a group
>>>u1.groups[0].owner_id = 2
# At this point, the User 1's proxy list is invalid and it gets expired at the next flush.
>>> session.flush()

I've glossed over a bit in the mock python session above, like the fact that I can't call session.flush() directly for my cache expiring code to run. But whats important is how the ProxyList behaves: It only loads a list of ids, and only when necessary. Then, it only loads actual objects from those ids when specifically asked (getitem or an iterator).

I've also left out how appending works. For now it only happens by changing the foregin key reference on the many object directly (setting owner_id in the above example.) This wouldn't work so well for related joins.

While I'm comfortable with my implementation and it seems to be working out for me, it is very limited by how it goes behind SA's back, so to speak. I would so much prefer to do something that integrates more seamlessly with SA rather than hacking around it. And I want to extend this behaviour for Many-Many Joins and even custom joins. I'm happy taking care of the cache checking/expiring in my own code, but it seems like this ProxyList idea could have a home in SA. If there are some hooks in it I could get my caching code in there pretty easily. I would need a way to define my own get for the property that accesses the list (to instantiate a list based on cache data if possible rather than db data), a hook in getitem so I could first check the cache. As for expiring the proper items in the cache, I can do as I do now and just grab a copy of the new and dirty lists right before a flush, and then post-flush I can expire the proper ProxyList relations from the cache based on those new/dirty items.

So for those of you who made it this far, what do you think? Am I missing a much more obvious way of doing this? I looked at the docs for custom list classes but they didn't seem like what I wanted. Any suggestions and critcisms are very welcome.

-Michael Carter

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Sqlalchemy-users mailing list
Sqlalchemy-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlalchemy-users

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

_______________________________________________
Sqlalchemy-users mailing list
Sqlalchemy-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlalchemy-users

Re: [Sqlalchemy-users] ProxyList proposal

Reply via email to