Just to throw in my 2 cents, I think caching should be a central feature to the datastore implementation. Unlike with a traditional environment where costs are pre-paid during the initial capital outleigh, every ineffeciency in GAE costs real money, and depending on the level of inefficiency, can add up to orders of magnitude more money. Take for example reading records. If you hit the datastore everytime you needed a record, it could literally cost you millions of times more in datastore fees if you had a lot of users. Similarly for writing. If you sent a write request every time some data changed you would be paying many times more than if you queued 100 requests and sent them in one big batch.
It would be irresponsible not to pre-optimize your datastore interactions through a cache. There are serious performance implications as well. The datastore has an awesome characteristic that allows it to process a query in the same amount of time regardless of how many records the datastore contains - be it 10 or 10 million. Execution costs only increase as the number of *results* being returned increase. That kind of consistency on every request is really amazing, but it comes with a downside. The small queries from well indexed relational db's that you're used to returning instantly - still take that fixed amount of time with google. So for large recordsets the datastoreoutperforms, but for smaller ones it won't be as snappy as you're used to. A caching layer would of course allow you to read from the datastore once, then serve subsequent requests from the super-fast cache. Another important point is threading, or lack thereof. Google App Engine does not support creating additional threads during a single request. So if for a certain request you needed to run several long queries in parallel in their own threads - you wouldn't be able to. You would have to run them sequentially in GAE and most likely exceed your 3 second request limit. Having a super-fast cache to read from mitigates this limitation. All this to say that caching with the datastore is much more important than in a regular environment. So much so that I think it should be baked right into OpenBD's datastore solution and turned on by default. The administrator could even have settings to manage the cache like "max queue size for batch write" and "max minutes between writes". To disable it you would have to change the values of those settings from their defaults to zero. The caching layer could even be invisible to the application. The same functions could be used like GoogleWrite() or GoogleRead(), but behind the scenes OpenBD would intelligently broker requests between the application, the cache and the datastore based on a few simple rules. The details of when to cache, how to cache, synchronizing whats dirty and whats not, queuing and sending batch requests, would all be hidden from the developer. In this case we are lucky that GAE and the datastore have the limitations that they do. Since there are no joins or complex query functions or groupings or variations in performance based on the nature of your data to worry about, it seems like it would be feasible to develop a caching algorithm that is optimal for any system. Similarly, there is only one viable caching technology to choose from, one developed precisely for this usecase, and one with an API - memcache. It would be the obvious choice of cache technology regardless of how the cache is implemented. The logic and choices behind implementing a caching layer on GAE is relatively straightforward compared to a more open environment. This decreases the potential innovations and customization that would be possible if it were left up to the application to manage, and therefore reduces the value of implementing it there. Why re-invent the wheel for each app? Baz On Sun, Jun 7, 2009 at 12:09 PM, Vince Bonfanti <[email protected]> wrote: > I don't know yet. Those are all very good questions. My plans are to finish > up the virtual file system implementation this week (adding support for > CFDIRECTORY, CFFILE, CFCONTENT, FileExists, DirectoryExists, and maybe some > other related tags/functions), then revisit the datastore layer, including > caching and some of the other items we've been discussing. > > Vince > On Sun, Jun 7, 2009 at 1:44 PM, Baz <[email protected]> wrote: > >> If I query and cache 10 records, and then on another request I try to >> query just one of those records, will openbd give me the record from the >> cache or re-query the datastore? Similarly, if I were writing an object to >> the datastore, will the cache have the ability to queue up that write >> request until there were enough to send a single, more efficient batch >> request? >> >> Baz >> >> >> >> On Sun, Jun 7, 2009 at 7:10 AM, Vince Bonfanti <[email protected]>wrote: >> >>> Yes. >>> >>> On Sun, Jun 7, 2009 at 5:24 AM, Baz <[email protected]> wrote: >>> >>>> Vince are you planning on implementing a caching layer ontop of the >>>> google datastore using the memcache api's? >>> >>> >>> >>> >>> >> >> >> >> --~--~---------~--~----~------------~-------~--~----~ Open BlueDragon Public Mailing List http://groups.google.com/group/openbd?hl=en official site @ http://www.openbluedragon.org/ !! save a network - trim replies before posting !! -~----------~----~----~----~------~----~------~--~---
