Re: [SQLObject] Rationale: Support for cache distribution delegates in SQLObject

David Faure Tue, 07 Mar 2006 05:50:10 -0800

> > So then, what are the data access modes currently supported? (or at
> > least proposed to be supported)
> >
> > a) more consistent read: "select ... for update" (proposed)
> > b) less consistent read: "select"
> > c) even less consistent read: from cache
> >
> > There is a tension between (b) and (c):  select from the database
> > server will provide access to recent updates from other clients,
> > while the cache will provide significantly better performance.
>
> The differences between read consistency do not occur when running  
> in a SERIALIZEABLE transaction isolation level, which is currently the  
> default with SQLObject and PostgreSQL (as it is the default setting with  
> psycopg1). This, and ensuring the caches are destroyed at the end of each  
> transaction, is how we are keeping both read consistency and cache speed.


I was not aware that the current cache is invalidated after each  
transaction.  That explains a lot; otherwise I couldn't see using it  
at all in a multi-client environment.

> > We should ask the question, "Can we have the performance of the cache
> > with inclusion of recent updates from other clients?"
> >
> > If so, then the need for (b) diminishes and we can rely more and more
> > on (a) and on (c).
>
> Using FOR UPDATE should be kept to a minimum. Using it increases  
> the chances
> of deadlocks from other processes. Using it as a mechanism to  
> ensure your
> cache is correct is somewhat self defeating as other processes will be
> blocked waiting for the locks.

http://www.postgresql.org/docs/7.4/static/transaction-iso.html

"Serializeable" doesn't seem to be very performance friendly, and can  
result in a transaction being selected for rollback.  I don't see  
this as much better than the instant deadlock detection and rollback  
associated with "select ... for update" in MySQL / InnoDB.  In any  
event, "for update" is about "update" correctness within the  
transaction and not about cache correctness.  It is not possible to  
select a value and then later update it in the same transaction,  
while maintaining ACID properties, without "for update."  I think  
this is still true while using PostgreSQL "serializeable."

>
> > We (myself and David Faure) are proposing that SQLObject support
> > cache mechanisms that have the following properties:
> >
> > - when an object is persisted to the database, a registered delegate
> > may distribute that object update to other client caches, either
> > directly or via one or more cache manager processes
> > - when an object is requested from the cache, a registered delegate
> > handles the actual lookup from a potentially user-supplied cache
> > mechanism, e.g., one that uses shared memory
> >
> > In our particular application, we want each SQLObject update to
> > notify one cache manager process per server that hosts clients of the
> > "object database."  Each cache manager populates a (single writer,
> > many readers) shared memory database of objects for that server.
> > Each SQLObject client has the option of fetching object state from
> > the shared memory.
> >
> > Of course the API for these delegates should be carefully
> > considered.  For example, it would probably be good if the update
> > notification delegate were not triggered until after committing the
> > transaction that touched the object(s) in question, so as not to
> > advertise object state that never really existed.
> >
> > I would be interested in thoughts on this approach, and in particular
> > in arguments against it.  I would hope that the latter would be
> > accompanied by a better suggestion for how to examine recent state of
> > SQLObjects in a distributed system with queries outweighing data
> > modifications by at least 50000 to 1.
>
> In some ways, it seems like you are building into the application layer a
> work around for a lack in your database layer (no SERIALIZABLE transaction
> isolation level in MySQL?).

Our distributed cache is about a drastic improvement in "select"  
performance (not "select for update") where our consistency model is  
that we want "some recent version of the object."  It is no stricter  
than that.  Our shared memory cache mechanism can support a million  
such queries per second, per CPU.  Of course after binding that to  
SQLObject it will be much slower, but still a lot faster than message  
passing to a networked database or even to a local replica.

> However, what you do gain is the ability to keep
> your cache alive across transaction boundaries *if* you only have data
> changes being made through processes using SQLObject and your  
> shared cache.

This is a very good point (updates from non-SQLObject users), and  
though I don't currently have any plans for non-SQLObject clients, it  
raises the question of whether the cache on each node can be  
replicated by a database trigger (from either the master or a local  
slave replica) rather than by an SQLObject delegate.

We are at MySQL 4.1 but I think 5.x has this functionality.  We would  
need to look at whether we want to migrate.

If the cache can get populated "some other way," then all we are  
looking for is the ability to construct a "read only" version of an  
SQLObject from our custom cache.

> It seems a decent approach if replication cannot meet your needs.  
> Does MySQL
> give you two phase commit? If not, you are going to have a race  
> condition
> that may or may not be important in your application (actually -  
> that could
> be avoided if the updater could lock the cache during commit to the  
> RDB).
>
> Is your high query load due to the volume of transactions, or  
> SQLObjects
> natural desire to issue 10,000 individual queries where one  
> intelligent JOIN
> would do? We are finding that identifying and optimizing  
> transactions that
> are issuing too many queries are giving us better improvements than  
> any
> amount of caching or improved hardware could.

Thanks for the tip, but in this case we need raw throughput on the  
number of simple objects we can look up a "recent version" of without  
network activity or other message passing.  If the lookup then  
results in a "probable" need to "do something," then we will do  
something.  That "something" will use normal select, select .. for  
update, update, insert, delete depending on what is going on.  Part  
of what "something" entails is examining whether the "probable"  
condition (from the cache) is "really" the current transactionally- 
consistent condition.  This is accomplished with a regular-way select  
or select .. for update.
>
> You may want to investigate ZODB and ZEO as your object cache. If  
> it meets
> your performance needs you might find all the heavy lifting is  
> already done
> for you, and replacing the cache simply a matter of setting up a  
> RAM based
> ZEO store and per server caches and replacing SQLObjects cache with  
> an ZODB
> backed OOBTree of IOBTree.

I did look into ZODB and I didn't think that it supported ACID  
properties through-and-through:

http://www.python.org/~jeremy/weblog/030514.html

That is, when using the equivalent of "select .. for update" (if  
there is one) and the various modifying operations, do I get the same  
ACID properties that I am used to in MySQL.  If not, we can't really  
afford to re-think the application structure at this point.  Other  
factors include:

- I like the transparency of the tabular (versus object)  
representation; it makes ad-hoc queries much easier
- I want to leave the door open for an upgrade to MySQL's NDB  
(Cluster) engine if/when they add the features we need.  This both  
for scalability and for performance.

In summary, I think we should pursue thinking about using a replica  
trigger to populate our cache.  It does seem to be cleaner and is  
certainly less of a modification to SQLObject.

The remaining item on the agenda is then the ability to populate an  
SQLObect from a user-supplied delegate.  Not being extremely familiar  
with the tool, I'm not sure exactly what I mean by that.

-John

David (still posting this on behalf of John Muehlhausen <[EMAIL PROTECTED]>, 
whom the mailinglist software doesn't seem to want to subscribe).

-- 
David Faure -- [EMAIL PROTECTED], [EMAIL PROTECTED]
KDE/KOffice developer, Qt consultancy projects
Klarälvdalens Datakonsult AB, Platform-independent software solutions


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
sqlobject-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss

Re: [SQLObject] Rationale: Support for cache distribution delegates in SQLObject

Reply via email to