> > So then, what are the data access modes currently supported? (or at > > least proposed to be supported) > > > > a) more consistent read: "select ... for update" (proposed) > > b) less consistent read: "select" > > c) even less consistent read: from cache > > > > There is a tension between (b) and (c): select from the database > > server will provide access to recent updates from other clients, > > while the cache will provide significantly better performance. > > The differences between read consistency do not occur when running > in a SERIALIZEABLE transaction isolation level, which is currently the > default with SQLObject and PostgreSQL (as it is the default setting with > psycopg1). This, and ensuring the caches are destroyed at the end of each > transaction, is how we are keeping both read consistency and cache speed.
I was not aware that the current cache is invalidated after each transaction. That explains a lot; otherwise I couldn't see using it at all in a multi-client environment. > > We should ask the question, "Can we have the performance of the cache > > with inclusion of recent updates from other clients?" > > > > If so, then the need for (b) diminishes and we can rely more and more > > on (a) and on (c). > > Using FOR UPDATE should be kept to a minimum. Using it increases > the chances > of deadlocks from other processes. Using it as a mechanism to > ensure your > cache is correct is somewhat self defeating as other processes will be > blocked waiting for the locks. http://www.postgresql.org/docs/7.4/static/transaction-iso.html "Serializeable" doesn't seem to be very performance friendly, and can result in a transaction being selected for rollback. I don't see this as much better than the instant deadlock detection and rollback associated with "select ... for update" in MySQL / InnoDB. In any event, "for update" is about "update" correctness within the transaction and not about cache correctness. It is not possible to select a value and then later update it in the same transaction, while maintaining ACID properties, without "for update." I think this is still true while using PostgreSQL "serializeable." > > > We (myself and David Faure) are proposing that SQLObject support > > cache mechanisms that have the following properties: > > > > - when an object is persisted to the database, a registered delegate > > may distribute that object update to other client caches, either > > directly or via one or more cache manager processes > > - when an object is requested from the cache, a registered delegate > > handles the actual lookup from a potentially user-supplied cache > > mechanism, e.g., one that uses shared memory > > > > In our particular application, we want each SQLObject update to > > notify one cache manager process per server that hosts clients of the > > "object database." Each cache manager populates a (single writer, > > many readers) shared memory database of objects for that server. > > Each SQLObject client has the option of fetching object state from > > the shared memory. > > > > Of course the API for these delegates should be carefully > > considered. For example, it would probably be good if the update > > notification delegate were not triggered until after committing the > > transaction that touched the object(s) in question, so as not to > > advertise object state that never really existed. > > > > I would be interested in thoughts on this approach, and in particular > > in arguments against it. I would hope that the latter would be > > accompanied by a better suggestion for how to examine recent state of > > SQLObjects in a distributed system with queries outweighing data > > modifications by at least 50000 to 1. > > In some ways, it seems like you are building into the application layer a > work around for a lack in your database layer (no SERIALIZABLE transaction > isolation level in MySQL?). Our distributed cache is about a drastic improvement in "select" performance (not "select for update") where our consistency model is that we want "some recent version of the object." It is no stricter than that. Our shared memory cache mechanism can support a million such queries per second, per CPU. Of course after binding that to SQLObject it will be much slower, but still a lot faster than message passing to a networked database or even to a local replica. > However, what you do gain is the ability to keep > your cache alive across transaction boundaries *if* you only have data > changes being made through processes using SQLObject and your > shared cache. This is a very good point (updates from non-SQLObject users), and though I don't currently have any plans for non-SQLObject clients, it raises the question of whether the cache on each node can be replicated by a database trigger (from either the master or a local slave replica) rather than by an SQLObject delegate. We are at MySQL 4.1 but I think 5.x has this functionality. We would need to look at whether we want to migrate. If the cache can get populated "some other way," then all we are looking for is the ability to construct a "read only" version of an SQLObject from our custom cache. > It seems a decent approach if replication cannot meet your needs. > Does MySQL > give you two phase commit? If not, you are going to have a race > condition > that may or may not be important in your application (actually - > that could > be avoided if the updater could lock the cache during commit to the > RDB). > > Is your high query load due to the volume of transactions, or > SQLObjects > natural desire to issue 10,000 individual queries where one > intelligent JOIN > would do? We are finding that identifying and optimizing > transactions that > are issuing too many queries are giving us better improvements than > any > amount of caching or improved hardware could. Thanks for the tip, but in this case we need raw throughput on the number of simple objects we can look up a "recent version" of without network activity or other message passing. If the lookup then results in a "probable" need to "do something," then we will do something. That "something" will use normal select, select .. for update, update, insert, delete depending on what is going on. Part of what "something" entails is examining whether the "probable" condition (from the cache) is "really" the current transactionally- consistent condition. This is accomplished with a regular-way select or select .. for update. > > You may want to investigate ZODB and ZEO as your object cache. If > it meets > your performance needs you might find all the heavy lifting is > already done > for you, and replacing the cache simply a matter of setting up a > RAM based > ZEO store and per server caches and replacing SQLObjects cache with > an ZODB > backed OOBTree of IOBTree. I did look into ZODB and I didn't think that it supported ACID properties through-and-through: http://www.python.org/~jeremy/weblog/030514.html That is, when using the equivalent of "select .. for update" (if there is one) and the various modifying operations, do I get the same ACID properties that I am used to in MySQL. If not, we can't really afford to re-think the application structure at this point. Other factors include: - I like the transparency of the tabular (versus object) representation; it makes ad-hoc queries much easier - I want to leave the door open for an upgrade to MySQL's NDB (Cluster) engine if/when they add the features we need. This both for scalability and for performance. In summary, I think we should pursue thinking about using a replica trigger to populate our cache. It does seem to be cleaner and is certainly less of a modification to SQLObject. The remaining item on the agenda is then the ability to populate an SQLObect from a user-supplied delegate. Not being extremely familiar with the tool, I'm not sure exactly what I mean by that. -John David (still posting this on behalf of John Muehlhausen <[EMAIL PROTECTED]>, whom the mailinglist software doesn't seem to want to subscribe). -- David Faure -- [EMAIL PROTECTED], [EMAIL PROTECTED] KDE/KOffice developer, Qt consultancy projects Klarälvdalens Datakonsult AB, Platform-independent software solutions ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ sqlobject-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/sqlobject-discuss
