[OpenBD] Re: GAE - Cache is King

Baz Sun, 07 Jun 2009 13:08:39 -0700

Just to throw in my 2 cents, I think caching should be a central feature to
the datastore implementation. Unlike with a traditional environment where
costs are pre-paid during the initial capital outleigh, every ineffeciency
in GAE costs real money,
and depending on the level of inefficiency, can add up to orders of
magnitude more money. Take for example reading records. If you hit the
datastore everytime you needed a record, it could literally cost you
millions of times more in datastore fees if you had a lot of users.
Similarly for writing. If you sent a write request every time some data
changed you would be paying many times more than if you queued 100 requests
and sent them in one big batch.

It would be irresponsible not to pre-optimize your datastore interactions
through a cache.

There are serious performance implications as well. The datastore has an
awesome characteristic that allows it to process a query in the same amount
of time regardless of how many records the datastore contains - be it 10 or
10 million. Execution costs only increase as the number of *results* being
returned increase. That kind of consistency on every request is really
amazing, but it comes with a downside. The small queries from well indexed
relational db's that you're used to returning instantly - still take that
fixed amount of time with google. So for large recordsets the
datastoreoutperforms, but for smaller ones it won't be as snappy as
you're used to. A
caching layer would of course allow you to read from the datastore once,
then serve subsequent requests from the super-fast cache.

Another important point is threading, or lack thereof. Google App Engine
does not support creating additional threads during a single request. So if
for a certain request you needed to run several long queries in parallel in
their own threads - you wouldn't be able to. You would have to run them
sequentially in GAE and most likely exceed your 3 second request limit.
Having a super-fast cache to read from mitigates this limitation.

All this to say that caching with the datastore is much more important than
in a regular environment. So much so that I think it should be baked right
into OpenBD's datastore solution and turned on by default. The administrator
could even have settings to manage the cache like "max queue size for batch
write" and "max minutes between writes". To disable it you would have to
change the values of those settings from their defaults to zero.

The caching layer could even be invisible to the application. The same
functions could be used like GoogleWrite() or GoogleRead(), but behind the
scenes OpenBD would intelligently broker requests between the application,
the cache and the datastore based on a few simple rules. The details of when
to cache, how to cache, synchronizing whats dirty and whats not, queuing and
sending batch requests, would all be hidden from the developer.

In this case we are lucky that GAE and the datastore have the limitations
that they do. Since there are no joins or complex query functions or
groupings or variations in performance based on the nature of your data to
worry about, it seems like it would be feasible to develop a caching algorithm
that is optimal for any system. Similarly, there is only one viable caching
technology to choose from, one developed precisely for this usecase, and one
with an API - memcache. It would be the obvious choice of cache technology
regardless of how the cache is implemented. The logic and choices behind
implementing a caching layer on GAE is relatively straightforward compared
to a more open environment. This decreases the potential innovations and
customization that would be possible if it were left up to the application
to manage, and therefore reduces the value of implementing it there. Why
re-invent the wheel for each app?

Baz

On Sun, Jun 7, 2009 at 12:09 PM, Vince Bonfanti <[email protected]> wrote:

> I don't know yet. Those are all very good questions. My plans are to finish
> up the virtual file system implementation this week (adding support for
> CFDIRECTORY, CFFILE, CFCONTENT, FileExists, DirectoryExists, and maybe some
> other related tags/functions), then revisit the datastore layer, including
> caching and some of the other items we've been discussing.
>
> Vince
> On Sun, Jun 7, 2009 at 1:44 PM, Baz <[email protected]> wrote:
>
>> If I query and cache 10 records, and then on another request I try to
>> query just one of those records, will openbd give me the record from the
>> cache or re-query the datastore? Similarly, if I were writing an object to
>> the datastore, will the cache have the ability to queue up that write
>> request until there were enough to send a single, more efficient batch
>> request?
>>
>> Baz
>>
>>
>>
>> On Sun, Jun 7, 2009 at 7:10 AM, Vince Bonfanti <[email protected]>wrote:
>>
>>> Yes.
>>>
>>> On Sun, Jun 7, 2009 at 5:24 AM, Baz <[email protected]> wrote:
>>>
>>>> Vince are you planning on implementing a caching layer ontop of the
>>>> google datastore using the memcache api's?
>>>
>>>
>>>
>>>
>>>
>>
>>   >>
>>

--~--~---------~--~----~------------~-------~--~----~
Open BlueDragon Public Mailing List
 http://groups.google.com/group/openbd?hl=en
 official site @ http://www.openbluedragon.org/

!! save a network - trim replies before posting !!
-~----------~----~----~----~------~----~------~--~---

[OpenBD] Re: GAE - Cache is King

Reply via email to