On 16 Oct 2006, at 4:29, Shane Ambler wrote:

Harvell F wrote:

Getting back to the original posting, as I remember it, the question was about seldom changed information. In that case, and assuming a repetitive query as above, a simple query results cache that is keyed on the passed SQL statement string and that simply returns the previously cooked result set would be a really big performance win.

I believe the main point that Mark made was the extra overhead is in the sql parsing and query planning - this is the part that postgres won't get around. Even if you setup simple tables for caching it still goes through the parser and planner and looses the benefits that memcached has. Or you fork those requests before the planner and loose the benefits of postgres. The main benefit of using memcached is to bypass the parsing and query planning.

That was the basis of my suggestion to just use the passed query string as the key. No parsing or processing of the query, just a simple string match.

You will find there is more to sql parsing than you first think, it needs to find the components that make up the sql statement (tables column names functions) and check that they exist and can be used in the context of the given sql and the given data matches the context that is given to be used in, it needs to check that the current user has enough privileges to perform the requested task, then it locates the data whether it be in the memory cache, on disk or an integrated version of memcached, this would also include checks to make sure another user hasn't locked the data to change it and whether there exists more than one version of the data, committed and uncommitted and then sends the results back to the client requesting it.

The user permissions checking is a potential issue but again, for the special case of repeated queries by the same user (the webserver process) for the same data, a simple match of the original query string _and_ the original query user, would still be very simple. The big savings by having the simple results cache would be the elimination of the parsing, planning, locating, combining, and sorting of the results set.

I don't believe normal locking plays a part in the cache (there are basic cache integrity locking issues though) nor does the versioning or commit states, beyond the invalidation of the cache upon a commit to a referenced table. It may be that the invalidation needs to happen whenever a table is locked as well. (The hooks for the invalidation would be done during the original caching of the results set.)

I know that the suggestion is a very simple minded suggestion and is limited to a very small subset of the potential query types and interactions, however, at least for web applications, it would be a very big win. Many website want to display today's data on their webpage and have it change as dates change (or as users change). The data in the source table doesn't change very often (especially compared to a popular website) and the number of times that the exact same query could be issued between changes can measure into the hundreds of thousands or more. Putting even this simple results cache into the database would really simplify the programmer's life and improve reliability (and the use of PostgreSQL).

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?


Reply via email to