Re: [HACKERS] Again, sorry, caching.

2002-04-14 Thread Bruce Momjian

Greg Copeland wrote:
 At this point in time, I think we've both pretty well beat this topic
 up.  Obviously there are two primary ways of viewing the situation.  I
 don't think anyone is saying it's a bad idea...I think everyone is
 saying that it's easier to address elsewhere and that overall, the net
 returns may be at the expense of some other work loads.  So, unless
 there are new pearls to be shared and gleaned, I think the topics been
 fairly well addressed.  Does more need to said?

With a PREPARE/EXECUTE patch now out for approval, can I assume we will
go with that first and see how far it gets us, and then revisit the idea
of cached results.  In this case, we are caching the query plan.  The
query still executes again in the executor, so the data is always fresh.
In a sense, the buffer cache and disk are the caches, which don't need
separate invalidation if some data changes in the table.

The plan can get invalid if it calls a non-cachable function or the
schema changes, or the constants used to generate the plan in the
optimizer would generate a different plan from the constants used in a
later query, or the analyze statistics changed.

The MVCC ramifications of cached results and invalidation could be quite
complex.  The commit of a transaction could change tuple visibility
rules even if the data modify statement was executed much earlier in the
transaction.

Also, on the NOTIFY/trigger idea, triggers are called on statement end,
not transaction end, so if an UPDATE query is in a multi-statement
transaction, another backend looking for the NOTIFY will receive it
before the transaction commits, meaning it will not see the update. 
That seems like a problem.  We do have deferrable constraints which will
only do checking on transaction end, but I am not sure if that can be
used for NOTIFY on transaction commit.  Anyone?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Again, sorry, caching.

2002-04-14 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Also, on the NOTIFY/trigger idea, triggers are called on statement end,
 not transaction end, so if an UPDATE query is in a multi-statement
 transaction, another backend looking for the NOTIFY will receive it
 before the transaction commits, meaning it will not see the update. 

No it won't.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Again, sorry, caching.

2002-04-14 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Also, on the NOTIFY/trigger idea, triggers are called on statement end,
  not transaction end, so if an UPDATE query is in a multi-statement
  transaction, another backend looking for the NOTIFY will receive it
  before the transaction commits, meaning it will not see the update. 
 
 No it won't.

Is this because NOTIFY is held for transaction end or because the
triggers are held until transaction commit?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Again, sorry, caching.

2002-04-14 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 No it won't.

 Is this because NOTIFY is held for transaction end or because the
 triggers are held until transaction commit?

The former.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Again, sorry, caching.

2002-04-14 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  No it won't.
 
  Is this because NOTIFY is held for transaction end or because the
  triggers are held until transaction commit?
 
 The former.

Thanks.  I see it in the NOTIFY manual page now.  Very nice.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Again, sorry, caching, (Tom What do you think: function

2002-03-22 Thread Jan Wieck

mlw wrote:
 Jan Wieck wrote:
 
  mlw wrote:
   [...]
  
   IMHO modifying the function manager to allow the return of a full row, and a
   set of full rows, answers a LOT of issues I have seen over the years with
   PostgreSQL extensibility.
 
  Sure.  Actually I think you'll have an easy project with this
  one, because all the work has been done by Tom already.
 
  The function manager isn't the problem any more. It  is  that
  you  cannot  have such a set of function in the rangetable.
  So you have no mechanism to USE the result.

 I'm not sure I follow you. OK, maybe I identified the wrong portion of code.

 The idea is that the first return value could return an array of varlenas, one
 for each column, then a set of varlenas, one for each column.

 Is there a way to return this to PostgreSQL?

There  is a way to return anything. The problem in PostgreSQL
is to actually USE it.

Our idea originally was  to  extend  the  capabilities  of  a
rangetable  entry.   Currently,  rangetable  entries can only
hold a relation, which is a table or a view. After rewriting,
they are down to real tables only.

But  basically,  a  rangetable  entry  should  just be a row-
source, so that a function returning a  row-set  could  occur
there too.

In  order  to avoid multiple calls to the function because of
nestloops and the like, I think when a set function occurs in
a RTE, it's result should be dumped into a sort-tape and that
is used as the row source in the rest of the plan.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #



_
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Again, sorry, caching, (Tom What do you think: function

2002-03-21 Thread Gavin Sherry

Neil,

Following is an email I sent the other day detailing how this works.

The entry point to the underlying invalidation system is the heap
manipulation functions: heap_delete(), heap_update(). (I've just had a
quick look at heap_insert() and cannot find where the cache modification
takes place)

These call RelationInvalidateHeapTuple() -
PrepareForTupleInvalidation() - 
RegisterCatcacheInvalidation()/RegisterRelcacheInvalidation.

These feed linked lists which get processed at the end of the transaction
as is detailed below. Clearly, this is a much better way of running the
LISTEN/NOTIFY than storing them in the system.

Gavin

-- Forwarded message --
Date: Wed, 20 Mar 2002 02:17:09 +1100 (EST)
From: Gavin Sherry [EMAIL PROTECTED]
To: Greg Copeland [EMAIL PROTECTED]
Cc: mlw [EMAIL PROTECTED], Jeff Davis [EMAIL PROTECTED],
 PostgreSQL-development [EMAIL PROTECTED]
Subject: Re: [HACKERS] Again, sorry, caching,  (Tom What do you think: function

On 19 Mar 2002, Greg Copeland wrote:

 On Tue, 2002-03-19 at 07:46, mlw wrote:
 [snip]
 
  Right now, the function manager can only return one value, or one set of values
  for a column. It should be possible, but require a lot of research, to enable
  the function manager to return a set of rows. If we could get that working, it
  could be fairly trivial to implement a cache as a contrib project. It would
  work something like this:
  
  select querycache(select * from mytable where foo='bar') ;
 
 Interesting concept...but how would you know when the cache has become
 dirty?  That would give you a set of rows...but I don't understand what
 would let you know your result set is invalid?
 
 Perhaps: select querycache( foobar_event, select * from my table where
 foo='bar' ) ; would automatically create a listen for you??


Personally, I think this method of providing query caching is very
messy. Why not just implement this along side the system relation
cache? This maybe be slightly more time consuming but it will perform
better and will be able to take advantage of Postgres's current MVCC.

There would be three times when the cache would be interacted with

ExecRetrieve() would need to be modified to handle a
prepare-for-cache-update kind of feature. This would involve adding the
tuple table slot data into a linked list.

At the end of processing/transaction and if the query was successfuly, the
prepare-for-cache-update list could be processed by AtCommit_Cache() 
(called from CommitTransaction()) and the shared cache updated.

2) attempt to get result set from cache

Before planning in postgres.c, test if the query will produce an already
cached result set. If so, send the data off from cache.

3) modification of underlying heap

Like (1), produce a list inside the executor (ExecAppend(), ExecDelete(),
ExecReplace() - RelationInvalidateHeapTuple() -
PrepareForTupleInvalidation()) which gets processed by
AtEOXactInvalidationMessages(). This results in the affected entries being
purged.

---

I'm not sure that cached results is a direction postgres need move in. But
if it does, I think this a better way to do it (given that I may have
overlooked something) than modifying the function manager (argh!).

Gavin





---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Again, sorry, caching.

2002-03-20 Thread Ross J. Reedstrom

On Tue, Mar 19, 2002 at 08:28:19PM -0500, Neil Conway wrote:
 On Tue, 2002-03-19 at 19:20, F Harvell wrote:
  I feel that the caching should be SQL transparent.  If it is
  implemented reasonably well, the performance gain should be pretty
  much universal.
 
 Well, the simple query cache scheme that is currently being proposed
 would use a byte-by-byte comparison of the incoming query. I think the
 consensus is that for a lot of workloads, this would be a bad idea.

Apparently, no one actually read _my_ proposal, they just replied to it.
All the arguements about if this kind of cache is any good have been
thrashed out, up-thread. Apparently Mr. Harvell didn't feel the need
to go back and read them. Going over them again is not productive -
the next step is to see if there is anything to actually _code_ here.

Caching is a hard problem, once you start digging into it. Going from
no cache to some cache is (almost) always a win, but multiple caches in
a datapath can interact in non-intuitive ways. And we _already_ have
several, well tuned, clever caches in place. Anything that messes with
them is going to be rejected, for sure. 

What I proposed is a sort of compromise: it is clear to me that the core
developers are not very interested in the kind of cache Neil is talking
about above, and would rather see query caching done in the app. What I
proposed is extending the backends support for client-side caching, to
make it easier (or possible) for middleware to automate the task.

The bare bones are: flag a query in some way so the backend auto generates
the appropriate NOTIFY triggers, so the middleware can do proper cache
maintenance by LISTENing. 

I think I'll go away and write up my compromise proposal a little more
clearly, and post it under a new subject, later. Perhaps we can then
have a productive discussion about _it_, and not rehash old arguments.

Ross

P.S. 

HACKER sociological opinion below - feel free to skip - 

There are only three reasons to discuss features on HACKERS: to
see if a proposed feature would be rejected, so you don't waste time
implementing it; to refine a proposed implementation, so it doesn't have
to be reworked; and to discuss an actual in-hand implementation. Notice
that there's no way to skip step one: if the CVS committers don't like
the feature, arguing for it on HACKERS won't make it magically better:
providing an implementation that doesn't do bad things _might_. And you
can always maintain an independent patch, or fork.

So, we have a number of people who think a query cache would be a
good idea.  And core developers who are not convinced.  I think one
of the reasons is that, while it might be useful in some situations
(even fairly common situations) it's neither elegant nor flexible. The
PostgreSQL project has a long tradition of turning down narrow, 'good
enough - it works for me' solutions, while looking for a better, more
inclusive solution. Sometimes this has been a problem with missing
features, but in the long run, it's been a win.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Again, sorry, caching.

2002-03-19 Thread Michael Alan Dorman

Neil Conway [EMAIL PROTECTED] writes:
 My impression (I could be wrong) is that LISTEN/NOTIFY doesn't get
 the press that it deserves. If this model isn't widely used because
 of some deficiencies in the LISTEN/NOTIFY implementation, IMHO our
 time would be better spent fixing those problems than implementing
 the proposed caching scheme.

I would have to say I think a large part of the problem is lack of
press---I've been hanging around pgsql-hackers for two or three years
now, and until all this discussion, had never heard of LISTEN/NOTIFY.

That doesn't mean I didn't miss prior mentions, but it certainly
doesn't seem to come up often or get a lot of discussion when it does.

Now that I know about it, well, it looks like it would be trivial to
use it to implement cache invalidation logic in my HTML::Mason-based
application---I need only have a long-lived process running on the web
server(s) that uses the perl Pg interface, and sits listening, and
when it sees notifies on given conditions, flush the appropriate local
caches.

I'd actually been contemplating cramming logic to do this down into
the library I use for implementing the system logic, but had resisted
doing it because it would make the library too tied to the web---this
would be much easier.

I won't even have to re-grab the results from the DB and reformat and
all that crap, I can just spew the output from the last time the page
was assembled---sounds better to me than what MySQL provides.  Of
course, I get a lot of this for free as a result of the tools I'm
using, but surely this sort of thing shouldn't be all that hard to
implement in other systems.

Mike.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Again, sorry, caching.

2002-03-19 Thread Jeff Davis

 Yes...I was thinking that a generic library interface with a nice design
 pattern might meet this need rather well.  Done properly, I think we can
 make it where all that, more or less, would be needed is application
 hooks which accept the result set to be cached and a mechanism to signal
 invalidation of the current cacheobviously that's not an exhaustive
 list... :)

A library implies that the application is running long enough to actually 
hear the notofication. Web apps start up, read from the database, and before 
any cache is needed they're done and the next one starts up, reading again 
from the database. Only currently open connections receive the notification.

I think that you do need an entire layer... but that's not a bad thing 
necessarily. Have a daemon that stays connected for a long time and when a 
notification arrives, rewrite the cache (or mark it dirty). Other 
applications can read data from static files or shared memory, or even 
another communication socket with your daemon.

There may be some way around running a daemon, so if you have a better 
solution please let me know.

I think I am in favor of client caching in general, but mlw (sorry, I can't 
find your real name in the emails at hand) makes some good points. The most 
important one is that we don't want to change application architectures on 
everyone. It's easy if you just have to add iscachable to a query, it's 
hard if you have to start writing against a different set of routines (to 
grab from your client cache rather than a database). 

However, I am perfectly happy writing a client-side cache or using temp 
tables to store a result set. I also don't care that much if someone chooses 
PostgreSQL for their website (unless I'm responsible for it's success in some 
way :) ). That's me personally, if you want to attract more users from mysql, 
iscachable is very likely an attractive feature.

Regards,
Jeff


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Again, sorry, caching, (Tom What do you think: function

2002-03-19 Thread Greg Copeland

On Tue, 2002-03-19 at 07:46, mlw wrote:
[snip]

 Right now, the function manager can only return one value, or one set of values
 for a column. It should be possible, but require a lot of research, to enable
 the function manager to return a set of rows. If we could get that working, it
 could be fairly trivial to implement a cache as a contrib project. It would
 work something like this:
 
 select querycache(select * from mytable where foo='bar') ;

Interesting concept...but how would you know when the cache has become
dirty?  That would give you a set of rows...but I don't understand what
would let you know your result set is invalid?

Perhaps: select querycache( foobar_event, select * from my table where
foo='bar' ) ; would automatically create a listen for you??

 
 This does two things that I would like to see: The ability to cache subselects
 independent of the full query. The ability to control which queries get cached.
 
 If we can get this row functionality in the function manager for 7.3, we could
 then implement MANY MANY enterprise level functionalities. Remote queries,
 query caching, external tables, etc. as contrib projects rather than full blown
 modifications to PostgreSQL.

Correct me if I'm wrong, but this concept would also be applicable to
some clustering/distributed query (that what you meant by remote
queries) needs too?

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Again, sorry, caching, (Tom What do you think: function

2002-03-19 Thread Greg Copeland

On Tue, 2002-03-19 at 07:46, mlw wrote:
 I was thinking about this. There seems to be a consensus that caching means no
 ACID compliance. And everyone seems to think it needs to be limited. We can
 implement a non-ACID cache as a contrib function with some work to the function
 manager.

Until know, I hadn't really thought about it...I just took it for
granted since it was asserted...however, what isn't ACID about the
approach that I offered?

A - Not effected...it's read only and provided directly from the
database, thus, it's still a function of the database.  Any change
resulting from atomic changes are notified to the cache, whereby it is
repopulated.
C - Not effected...the database is still responsible for keeping
consistency.  The cache is still read only.  State is ensured as
invalidation is notified by the database and data set should be returned
consistent by the database or the database is broken.
I - Again, the database is still performing this task and notifies the
cache when updates need to take place.  Again, Isolation isn't an issue
because the cache is still read-only.
D - Durability isn't a question either as, again, the database is still
doing this.  In the event of cache failure...it would be repopulated
from the database...so it would be as durable as is the database.

Please help me understand.

Thanks,
Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Again, sorry, caching.

2002-03-19 Thread Doug McNaught

Jeff Davis [EMAIL PROTECTED] writes:

 A library implies that the application is running long enough to actually 
 hear the notofication. Web apps start up, read from the database, and before 
 any cache is needed they're done and the next one starts up, reading again 
 from the database. Only currently open connections receive the notification.

If your web app works this way than you already don't care about
performance.  People doing scalable web apps these days use connection
pooling and session data kept in memory, so you already have a
persistent layer running (whether it's your JVM, Apache process for
mod_perl or PHP, or whatever).  Really big apps definitely have a
long-running daemon process that handles caching, session management
(so you can have multiple webservers) etc etc...

-Doug
-- 
Doug McNaught   Wireboard Industries  http://www.wireboard.com/

  Custom software development, systems and network consulting.
  Java PostgreSQL Enhydra Python Zope Perl Apache Linux BSD...

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Again, sorry, caching, (Tom What do you think: function

2002-03-19 Thread mlw

Gavin Sherry wrote:
 I'm not sure that cached results is a direction postgres need move in. But
 if it does, I think this a better way to do it (given that I may have
 overlooked something) than modifying the function manager (argh!).

I actually had an anterior motive.

Your comment about caching not being a direction in which PostgreSQL needs to
move, says it all. The general rank and file seems to agree. I think caching
could speed up a number of things, certainly some of the stuff I have been
working on. I think it would be more likely to get some sort of caching from a
contrib project rather than to sway the core team.

IMHO modifying the function manager to allow the return of a full row, and a
set of full rows, answers a LOT of issues I have seen over the years with
PostgreSQL extensibility.

With a full row function API we can implement:

(1) Remote Queries
select remotequery(hostname, port, 'select * from foo');

(2) External queries
select mysqlquery(hostname, port, 'select * from foo');

(3) Cached queries
select cachedquery('select * from foo');

(4) Full text search
select ftssquery(hostname, port, 'word1 and word2 and word3 not word4');

Again, with full row functions, we could prototype/implement many advanced
features in PostgreSQL as contrib projects.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Again, sorry, caching.

2002-03-19 Thread F Harvell

On Tue, 19 Mar 2002 12:12:52 CST, Ross J. Reedstrom wrote:
 On Mon, Mar 18, 2002 at 09:35:51PM -0500, Neil Conway wrote:
   
   It is an application issue
   This is completely wrong. Caching can not be done against a database without
   knowledge of the database, i.e. when the data changes.
  ...
  
  If we're looking to provide a quick and easy caching scheme for users
  attracted to MySQL's query cache, why not provide this functionality
  through another application?
  ...
  
  What does everyone think?
 
 Neil, this sounds like exactly the approach to follow up on: 
 ...
 
 Seems like a win all around. Anyone else have comments?
 ...

  I'm not certain the full direction of the thinking here, however, it
seems to me that there are a few considerations that I would like to
see/keep in mind:

I feel that the caching should be SQL transparent.  If it is
implemented reasonably well, the performance gain should be pretty
much universal.  Yes, a large number of queries would never be called
again, however, the results still need to be fetched into memory and
caching them for later reuse should involve little more than a
skipped free (think filesystem cache).  It makes more sense to specify
non-cachable in a query for tuning than cacheable.  This also
means that just switching databases to PostgreSQL improves my
performance.

Also, it is very important that the caching should be transparent to
the application.  This means that the application should be able to
connect to the database using the standard application interface
(i.e., ODBC, PHP, Perl/DBI, etc.)  This allows me to port my existing
Oracle/DB2/MySQL/etc. application to pgsql through normal porting.  If
I have to implement a non-standard interface, I can likely gain even
more performance through custom code and maintain reasonable database
independence.

While I am a strong believer in PostgreSQL, many of my customers have
other demands/requirements.  I still want to be able to use my
existing code and libraries when using their database.  Sticking with
the standards allows me to develop best of class applications and
interface to best of class databases.  It also allows others to easily
realize the value of PostgreSQL.

Thanks,
F Harvell




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Again, sorry, caching.

2002-03-19 Thread Neil Conway

On Tue, 2002-03-19 at 19:20, F Harvell wrote:
 I feel that the caching should be SQL transparent.  If it is
 implemented reasonably well, the performance gain should be pretty
 much universal.

Well, the simple query cache scheme that is currently being proposed
would use a byte-by-byte comparison of the incoming query. I think the
consensus is that for a lot of workloads, this would be a bad idea.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED]
PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Again, sorry, caching.

2002-03-18 Thread Karel Zak

On Sat, Mar 16, 2002 at 09:01:28AM -0500, mlw wrote:

 If it is mostly static data, why not just make it a static page?
 Because a static page is a maintenance nightmare. One uses a database in a web
 site to allow content to be changed and upgraded dynamically and with a minimum
 of work.

 It's ugly argumentation for DB cache. What generate web page after data 
 change and next time use it as static?

 I was thinking that it could be implemented as a keyword or comment in a query.
 Such as:
 
 select * from table where column = 'foo' cacheable

 You can insert mostly static data into temp table and in next queries 
 use this temp table. After update/delete/insert can your application
 rebuild temp table (or by trigger?).

Karel

-- 
 Karel Zak  [EMAIL PROTECTED]
 http://home.zf.jcu.cz/~zakkr/
 
 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Again, sorry, caching.

2002-03-18 Thread mlw

Karel Zak wrote:
 
 On Sat, Mar 16, 2002 at 09:01:28AM -0500, mlw wrote:
 
  If it is mostly static data, why not just make it a static page?
  Because a static page is a maintenance nightmare. One uses a database in a web
  site to allow content to be changed and upgraded dynamically and with a minimum
  of work.
 
  It's ugly argumentation for DB cache. What generate web page after data
  change and next time use it as static?
 
  I was thinking that it could be implemented as a keyword or comment in a query.
  Such as:
 
  select * from table where column = 'foo' cacheable
 
  You can insert mostly static data into temp table and in next queries
  use this temp table. After update/delete/insert can your application
  rebuild temp table (or by trigger?).

Yes, I could, as could most of the guys reading these messages. I am thinking
about a feature in PostgreSQL that would make that easier for the average DBA
or web producer.

Lets face it, MySQL wins a lot of people because they put in features that
people want. All the ways people have suggested to compete with MySQL's
caching have been ugly kludges. 

I understand the there is an amount of work involved with doing caching, and
the value of caching is debatable by some, however, it is demonstrable that
caching can improve a very common, albeit specific, set of deployments. Also,
managing data is the job of the database, not the application. It does belong
in PostgreSQL, if someone is forced to write a caching scheme around
PostgreSQL, it is because PostgreSQL lacks that feature.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Again, sorry, caching.

2002-03-18 Thread Jean-Michel POURE

Le Lundi 18 Mars 2002 13:23, mlw a écrit :
 Lets face it, MySQL wins a lot of people because they put in features that
 people want.

MySQL is very interested in benchmarks.
It does not really care for data consistency.

Cheers, Jean-Michel POURE

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Again, sorry, caching.

2002-03-18 Thread Greg Copeland

Yes.  EVERY person that I've ever known which runs MySQL run for two
very simple reasons.  First, they believe it to be wicked fast.  Second,
they don't understand what ACID is, what a transaction is, or why
running a single session against a database to perform a benchmark is a
completely bogus concept.  In case it's not obvious, these are usually
people that are trying to take a step up from Access.  While I do
believe MySQL, from a performance perspective, is a step up from Access
I always tell my clients...if you wouldn't use an Access database for
this project, you shouldn't use MySQL either.

To me, this means we need better advertising, PR, and education rather
than a result set cache.  :P

Speaking of which, I'm wondering if there are any design patterns we can
look at which would address client side caching...well, at least make it
easier to implement as well as implement it in a consistent manner.

Greg


On Mon, 2002-03-18 at 07:32, Jean-Michel POURE wrote:
 Le Lundi 18 Mars 2002 13:23, mlw a écrit :
  Lets face it, MySQL wins a lot of people because they put in features that
  people want.
 
 MySQL is very interested in benchmarks.
 It does not really care for data consistency.
 
 Cheers, Jean-Michel POURE
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Again, sorry, caching.

2002-03-18 Thread mlw

Jean-Michel POURE wrote:
 
 Le Lundi 18 Mars 2002 13:23, mlw a écrit :
  Lets face it, MySQL wins a lot of people because they put in features that
  people want.
 
 MySQL is very interested in benchmarks.
 It does not really care for data consistency.

In no way am I suggesting we avoid ACID compliance. In no way am I suggesting
that PostgreSQL change. All I am suggesting is that tables which change
infrequently can and should be cached.

select * from table where foo = 'bar'

Need not be executed twice if the table has not changed. 

select * from table1, (select * from table2 where foo='bar' cacheable) as
subset were subset.col1 = table1.col1;

In the above query, if table two changes 4 times a day, and it queried a couple
times a minute or second, the caching of the subset could save a huge amount of
disk I/O.

This sort of query could improve many catalog based implementations, from
music, to movies, to books. A library could implement a SQL query for book
lookups like this:

select * from authors, (select * from books where genre = 'scifi' cacheable) as
subset where authors.id = subset.auhorid and authors.id in ()

Yes it is arguable that index scans may work better, and obviously, summary
tables may help, etc. but imagine a more complex join which produces fewer
records, but is executed frequently. Caching could help the performance of
PostgreSQL in some very real applications.

MySQL's quest for benchmarking numbers, I agree, is shameful because they
create numbers which are not really applicable in the real world. This time,
however, I think they may be on to something.

(1) PostgreSQL use a cacheable or iscacheable keyword.
(2) If the query uses functions which are not marked as iscacheable, then it
is not cached.
(3) If any table contained within the cacheable portion of the query is
modified, the cache is marked as dirty.
(4) No provisions are made to recreate the cache after an insert/update/delete.
(5) The first query marked as iscacheable that encounters a dirty flag in a
table, does an exhaustive search on the cache and removes all entries that are
affected.


As far as I can see, if the above parameters are used to define caching, it
could improve performance on sites where a high number of transactions are
made, where there is also a large amount of static data, i.e. a ecommerce site,
library, etc. If the iscacheable keyword is not used, PostgreSQL will not
incur any performance degradation. However, if he iscacheable keyword is
used, the performance loss could very well be made up by the benefits of
caching.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Again, sorry, caching.

2002-03-18 Thread mlw

Mattew T. O'Connor wrote:
 
  My big problem with putting the cache outside of the database is that it is
  now incumbent on the applications programmer to write a cache. A database
  should manage the data, the application should handle how the data is
  presented. Forcing the application to implement a cache feels wrong.
 
 I believe someone suggested a possible solution that was in the pg client
 using NOTICE and triggers.  The argument given against it, was that
 it would not be ACID compliant.  I say, who cares.  I would think that the
 select cachable would only be allowed for simple selects, it would not be
 used for select for update or anything else.  Anytime you are given the
 result of a simple select, you are not guaranteed that the data won't change
 underneath you.  

Not true, if you begin a transaction, you can be isolated of changes made to
the database.

The primary use that you have suggested is for web sites,
 and they certainly won't mind of the cache is 0.3seconds out of date.

Again, if they don't care about accuracy, then they will use MySQL. PostgreSQL
is a far better system. Making PostgreSQL less accurate, less correct takes
away, IMHO, the very reasons to use it.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Again, sorry, caching.

2002-03-18 Thread Greg Copeland

On Mon, 2002-03-18 at 08:15, mlw wrote:
 Mattew T. O'Connor wrote:
  
[snip]

 
 The primary use that you have suggested is for web sites,
  and they certainly won't mind of the cache is 0.3seconds out of date.
 
 Again, if they don't care about accuracy, then they will use MySQL. PostgreSQL
 is a far better system. Making PostgreSQL less accurate, less correct takes
 away, IMHO, the very reasons to use it.
 

If you are using a web site and you need real time data within 0.3s,
you've implemented on the wrong platform.  It's as simple as that.  In
the web world, there are few applications where a 0.3s of a window is
notable.  After all, that 0.3s of a window can be anywhere within the
system, including the web server, network, any front end caches, dns
resolutions, etc.

I tend to agree with Mettew.  Granted, there are some application
domains where this can be critical...generally speaking, web serving
isn't one of them.

That's why all of the solutions I offered were pointedly addressing a
web server scenario and not a generalized database cache.  I completely
agree with you on that.  In a generalized situation, the database should
be managing and caching the data (which it already does).

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Again, sorry, caching.

2002-03-18 Thread mlw

Greg Copeland wrote:
 
 On Mon, 2002-03-18 at 08:15, mlw wrote:
  Mattew T. O'Connor wrote:
  
 [snip]
 
 
  The primary use that you have suggested is for web sites,
   and they certainly won't mind of the cache is 0.3seconds out of date.
 
  Again, if they don't care about accuracy, then they will use MySQL. PostgreSQL
  is a far better system. Making PostgreSQL less accurate, less correct takes
  away, IMHO, the very reasons to use it.
 
 
 If you are using a web site and you need real time data within 0.3s,
 you've implemented on the wrong platform.  It's as simple as that.  In
 the web world, there are few applications where a 0.3s of a window is
 notable.  After all, that 0.3s of a window can be anywhere within the
 system, including the web server, network, any front end caches, dns
 resolutions, etc.

This is totally wrong! An out of date cache can cause errors by returning
results that are no longer valid, thus causing lookup issues. That is what ACID
compliance is all about.

 
 I tend to agree with Mettew.  Granted, there are some application
 domains where this can be critical...generally speaking, web serving
 isn't one of them.
 
 That's why all of the solutions I offered were pointedly addressing a
 web server scenario and not a generalized database cache.  I completely
 agree with you on that.  In a generalized situation, the database should
 be managing and caching the data (which it already does).

But it does not cache a query. An expensive query which does an index range
scan and filters by a where clause could invalidate a good number of buffers in
the buffer cache. If this or a number of queries like it are frequently
repeated, verbatim, in a seldom changed table, why not cache them within
PostgreSQL? It would improve overall performance by preserving more blocks in
the buffer cache and eliminate a number of queries being executed.

I don't see how caching can be an argument of applicability. I can understand
it from a time/work point of view, but to debate that it is a useful feature
seems ludicrous.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Again, sorry, caching.

2002-03-18 Thread Greg Copeland

On Mon, 2002-03-18 at 10:08, mlw wrote:
 Greg Copeland wrote:
  
  On Mon, 2002-03-18 at 08:15, mlw wrote:
   Mattew T. O'Connor wrote:
   
[snip]

  
  If you are using a web site and you need real time data within 0.3s,
  you've implemented on the wrong platform.  It's as simple as that.  In
  the web world, there are few applications where a 0.3s of a window is
  notable.  After all, that 0.3s of a window can be anywhere within the
  system, including the web server, network, any front end caches, dns
  resolutions, etc.
 
 This is totally wrong! An out of date cache can cause errors by returning
 results that are no longer valid, thus causing lookup issues. That is what ACID
 compliance is all about.

I understand what ACID is about.  Question.  Was the result set valid
when it was cached?  Yes.  So will it be valid when it's returned as a
cached result set?  Yes.  Might it be an out of date view.  Sure...with
a horribly small window for becoming out of date.  Will it cause look
up problems?  Might.  No more than what you are proposing.  In the mean
time, the FE cached result set, performance wise, is beating the pants
off of the database cached solution on both a specific work load and
over all system performance.

I should point out that once the FE cache has been notified that it's
cache is invalid, the FE would no longer return the invalidated result
set.  I consider that to be a given, however, from some of your comments
I get the impression that you think the invalid result set would
continue to be served.  Another way of thinking about that is...it's
really not any different from the notification acting as the result
returned result set...from a validity perspective.  That is...if that
had been the returned result set (the notification) from the
database...it would be accurate (which in the case means the FE cache is
now dirty and treated as such)...if the query is refreshed because it is
now invalid..the result set is once again accurate and reflective of the
database.

Example...


Database cache
Query result set
Result set returned (cached on database)
local change to database (result set cache invalid)
new query based on out of date queried result set


Application cache
Query result set (cached)
Result set returned
local change to database (app cache invalid and signaled)
new query based on out of date queried result set

Both have that problem since transactional boundaries are hard to keep
across HTTP requests.  This again, is why for web applications, a FE
cache is perfectly acceptable for *most* needs.  Also notice that your
margin for error is more or less the same.

[snip]

 I don't see how caching can be an argument of applicability. I can understand
 it from a time/work point of view, but to debate that it is a useful feature
 seems ludicrous.

I don't think I'm arguing if it's applicable or useful.  Rather, I'm
saying that faster results can be yielded by implementing it in the
client with far less effort than it would take to implement in the BE. 
I am arguing that it's impact on overall system performance (though I
really didn't do more than just touch on this topic) is
questionable...granted, it may greatly enhance specific work loads...at
the expense of others.  Which shouldn't be too surprising as trade offs
of some type are pretty common.

At this point in time, I think we've both pretty well beat this topic
up.  Obviously there are two primary ways of viewing the situation.  I
don't think anyone is saying it's a bad idea...I think everyone is
saying that it's easier to address elsewhere and that overall, the net
returns may be at the expense of some other work loads.  So, unless
there are new pearls to be shared and gleaned, I think the topics been
fairly well addressed.  Does more need to said?

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Again, sorry, caching.

2002-03-18 Thread Neil Conway

On Sat, 2002-03-16 at 09:01, mlw wrote:
 On a web site, a few specific queries get executed, unchanged, repeatedly.
 Think about an ecommerce site, most of the time it is just a handful of basic
 queries. These basic queries are usually against pretty large product tables. A
 caching mechanism would make these queries pretty light weight.
 
 The arguments against caching:
 
 It is an application issue
 This is completely wrong. Caching can not be done against a database without
 knowledge of the database, i.e. when the data changes.

But can't this be achieved by using a LISTEN/NOTIFY model, with
user-created rules to NOTIFY the appropriate listener when a table
changes? With a good notification scheme like this, you don't need to
continually poll the DB for changes. You don't need to teach your cache
a lot of things about the database, since most of that knowledge is
encapsulated inside the rules, and supporting tables.

My impression (I could be wrong) is that LISTEN/NOTIFY doesn't get the
press that it deserves. If this model isn't widely used because of some 
deficiencies in the LISTEN/NOTIFY implementation, IMHO our time would be
better spent fixing those problems than implementing the proposed
caching scheme.

If we're looking to provide a quick and easy caching scheme for users
attracted to MySQL's query cache, why not provide this functionality
through another application? I'm thinking about a generic caching
layer that would sit in between Postgres and the database client. It
could speak the FE/BE protocol as necessary; it would use LISTEN/NOTIFY
to allow it to efficiently be aware of database changes; it would create
the necessary rules for the user, providing a simple interface to
enabling query caching for a table or a set of tables?

What does everyone think?

 OK, let me have it, tell me how terrible an idea this is. tell me how wrong I
 am.

I think your goals are laudable (and I also appreciate the effort that
you and everyone else puts into Postgres); I just think we could get
most of the benefits without needing to implement potentially complex
changes to Postgres internals.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED]
PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Again, sorry, caching.

2002-03-18 Thread Greg Copeland

On Mon, 2002-03-18 at 20:35, Neil Conway wrote:
[snip]

 My impression (I could be wrong) is that LISTEN/NOTIFY doesn't get the
 press that it deserves. If this model isn't widely used because of some 
 deficiencies in the LISTEN/NOTIFY implementation, IMHO our time would be
 better spent fixing those problems than implementing the proposed
 caching scheme.
 
 If we're looking to provide a quick and easy caching scheme for users
 attracted to MySQL's query cache, why not provide this functionality
 through another application? I'm thinking about a generic caching
 layer that would sit in between Postgres and the database client. It
 could speak the FE/BE protocol as necessary; it would use LISTEN/NOTIFY
 to allow it to efficiently be aware of database changes; it would create
 the necessary rules for the user, providing a simple interface to
 enabling query caching for a table or a set of tables?
 
 What does everyone think?
 

Yes...I was thinking that a generic library interface with a nice design
pattern might meet this need rather well.  Done properly, I think we can
make it where all that, more or less, would be needed is application
hooks which accept the result set to be cached and a mechanism to signal
invalidation of the current cacheobviously that's not an exhaustive
list... :)

I haven't spent much time on this, but I'm fairly sure some library
routines can be put together which would greatly reduce the effort of
application coders to support fe-data caches and still be portable for
even the Win32 port.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Again, sorry, caching.

2002-03-17 Thread mlw

Andrew Sullivan wrote:
 
 On Sat, Mar 16, 2002 at 09:01:28AM -0500, mlw wrote:
 
  If it is mostly static data, why not just make it a static page?
  Because a static page is a maintenance nightmare. One uses a
  database in a web site to allow content to be changed and upgraded
  dynamically and with a minimum of work.
 
 This seems wrong to me.  Why not build an extra bit of functionality
 so that when the admin makes a static-data change, the new static
 data gets pushed into the static files?
 
 I was originally intrigued by the suggestion you made, but the more I
 thought about it (and read the arguments of others) the more
 convinced I became that the MySQL approach is a mistake.  It's
 probably worth it for their users, who seem not to care that much
 about ACID anyway.  But I think for a system that really wants to
 play in the big leagues, the cache is a big feature that requires a
 lot of development, but which is not adequately useful for most
 cases.  If we had infinite developer resources, it might be worth it.
 In the actual case, I think it's too low a priority.

Again, I can't speak to priority, but I can name a few common application where
caching would be a great benefit. The more I think about it, the more I like
the idea of a 'cacheable' keyword in the select statement.

My big problem with putting the cache outside of the database is that it is now
incumbent on the applications programmer to write a cache. A database should
manage the data, the application should handle how the data is presented.
Forcing the application to implement a cache feels wrong.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Again, sorry, caching.

2002-03-17 Thread mlw

Greg Copeland wrote:
 
 On Sat, 2002-03-16 at 08:36, mlw wrote:
  Triggers and asynchronous notification are not substitutes for real hard ACID
  complient caching. The way you suggest implies only one access model. Take
the
  notion of a library, they have both web and application access. These should
  both be able to use the cache.
 
 
 Well, obviously, you'd need to re-implement the client side cache in
 each implementation of the client.  That is a down side and I certainly
 won't argue that.  As for the no substitute comment, I'm guess I'll
 plead ignorance because I'm not sure what I'm missing here.  What am I
 missing that would not be properly covered by that model?

It would not be guarenteed to be up to date with the state of the database. By
implementing the cache within the database, PostgreSQL could maintain the
consistency.

 
  Also, your suggestion does not address the sub-select case, which I think is
  much bigger, performance wise, and more efficient than MySQL's cache.
 
 I'm really not sure what you mean by that.  Doesn't address it but is
 more efficient?  Maybe it's because I've not had my morning coffee
 yet... ;)

If an internal caching system can be implemented within PostgreSQL, and trust
me, I undersand what a hairball it would be with multiversion concurrency,
omplex queries such as:

select * from (select * from mytable where foo = 'bar' cacheable) as subset
where subset.col = 'value'

The 'cacheable' keyword applied to the query would mean that PostgreSQL could
keep that result set handy for later use. If mytable and that subselect always
does a table scan, no one can argue that this subquery caching could be a huge
win.

As a side note, I REALLY like the idea of a keyword for caching as apposed to
automated caching. t would allow the DBA or developer more control over
PostgreSQL's behavior, and poentially make the fature easier to implement.

 
 
  This whole discussion could be moot, and this could be developed as an
  extension, if there were a function API that could return sets of whole rows.
 
 
Currently a function can only return one value or a setof a single type,
implemented as one function call for each entry in a set. If there could be a
function interface which could return a row, and multiple rows similar to the
'setof' return, that would be very cool. That way caching can be implemented
as:

select * from pgcache('select * from mytable where foo='bar') as subset where
subset.col = 'value';

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Again, sorry, caching.

2002-03-17 Thread mlw

I think the notion that data is managed outside of the database is bogus. Query
caching can improve performance in some specific, but popular, scenarios.
Saying it does not belong within the database and is the job of the
application, is like saying file caching is not a job of the file system but is
the job of the application.

This is a functionality many users want, and can be justified by some very
specific, but very common, scenarios. It is not me to say if it is worth the
work, or if it should be done. From the perspective of the user, having this
capability within the database is an important feature, I want to make the
argument.

Greg Copeland wrote:
 
 I previously replied to you vaguely describing a way you could implement
 this by using a combination of client side caching and database tables
 and triggers to allow you to determine if your cache is still valid.
 Someone came right behind me, Tom maybe??, and indicated that the
 proper/ideal way to do this would be to using postgres' asynchronous
 database notification mechanisms (listen/notify I believe were the
 semantics) to alert your application that your cache has become
 invalid.  Basically, a couple of triggers and the use of the list/notify
 model, and you should be all set.
 
 Done properly, a client side cache which is asynchronously notified by
 the database when it's contents become invalid should be faster than
 relying on MySQL's database caching scheme.  Basically, a strong client
 side cache is going to prevent your database from even having to return
 a cached result set while a database side cache is going to always
 return a result set.  Of course, one of the extra cool things you can do
 is to cache a gzip'd copy of the data contents which would further act
 as an optimization preventing the client or web server (in case they are
 different) from having to recompress every result set.
 
 In the long run, again, if properly done, you should be able to beat
 MySQL's implementation without too extra much effort.  Why?  Because a
 client side cache can be much smarter in the way that it uses it's
 cached contents much in the same way an application is able to better
 cache it's data then what the file system is able to do.  This is why an
 client side cache should be preferred over that of a database result set
 cache.
 
 Greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Again, sorry, caching.

2002-03-16 Thread Greg Copeland

I previously replied to you vaguely describing a way you could implement
this by using a combination of client side caching and database tables
and triggers to allow you to determine if your cache is still valid. 
Someone came right behind me, Tom maybe??, and indicated that the
proper/ideal way to do this would be to using postgres' asynchronous
database notification mechanisms (listen/notify I believe were the
semantics) to alert your application that your cache has become
invalid.  Basically, a couple of triggers and the use of the list/notify
model, and you should be all set.

Done properly, a client side cache which is asynchronously notified by
the database when it's contents become invalid should be faster than
relying on MySQL's database caching scheme.  Basically, a strong client
side cache is going to prevent your database from even having to return
a cached result set while a database side cache is going to always
return a result set.  Of course, one of the extra cool things you can do
is to cache a gzip'd copy of the data contents which would further act
as an optimization preventing the client or web server (in case they are
different) from having to recompress every result set.

In the long run, again, if properly done, you should be able to beat
MySQL's implementation without too extra much effort.  Why?  Because a
client side cache can be much smarter in the way that it uses it's
cached contents much in the same way an application is able to better
cache it's data then what the file system is able to do.  This is why an
client side cache should be preferred over that of a database result set
cache.

Greg

References:
http://www.postgresql.org/idocs/index.php?sql-notify.html
http://www.postgresql.org/idocs/index.php?sql-listen.html


On Sat, 2002-03-16 at 08:01, mlw wrote:
 I traded a couple emails with a guy using one of my open source projects. To
 make a long story short, he is going to the new version of MySQL for his
 website because of the new caching feature. He is convinced that it will speed
 up his web site, and he is probably right.
 
 On a web site, a few specific queries get executed, unchanged, repeatedly.
 Think about an ecommerce site, most of the time it is just a handful of basic
 queries. These basic queries are usually against pretty large product tables. A
 caching mechanism would make these queries pretty light weight.
 
 The arguments against caching:
 
 It is an application issue
 This is completely wrong. Caching can not be done against a database without
 knowledge of the database, i.e. when the data changes.
 
 If it is mostly static data, why not just make it a static page?
 Because a static page is a maintenance nightmare. One uses a database in a web
 site to allow content to be changed and upgraded dynamically and with a minimum
 of work.
 
 It isn't very useful
 I disagree completely. A cache of most frequently used queries, or specific
 ones, could make for REALLY good performance in some very specific, but very
 common, applications. Any system that has a hierarchical drill down interface
 to a data set, ecommerce, libraries, document management systems, etc. will
 greatly benefit from a query cache.
 
 I was thinking that it could be implemented as a keyword or comment in a query.
 Such as:
 
 select * from table where column = 'foo' cacheable
 or
 select * from table where column = 'bar' /* cacheable */
 
 Either way, it would speed up a lot of common application types. It would even
 be very cool if you could just cache the results of sub queries, such as:
 
 select * from (select * from table where col1 = 'foo' cacheable) as subset
 where subset.col2 = 'bar' ;
 
 Which would mean that the subquery gets cached, but the greater select need not
 be. The cache could be like a global temp table. Perhaps the user could even
 name the cache entry:
 
 select * from table where column = 'foo' cache on foo
 
 Where one could also do:
 
 select * from cache_foo
 
 Using a keyword is probably a better idea, it can be picked up by the parser
 and instruct PostgreSQL to use the cache, otherwise there will be no additional
 overhead.
 
 Having caching within PostgreSQL will be good for data integrity. Application
 caches can't tell when an update/delete/insert happens, they often have to use
 a time-out mechanism.
 
 OK, let me have it, tell me how terrible an idea this is. tell me how wrong I
 am.
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Again, sorry, caching.

2002-03-16 Thread mlw

Triggers and asynchronous notification are not substitutes for real hard ACID
complient caching. The way you suggest implies only one access model. Take the
notion of a library, they have both web and application access. These should
both be able to use the cache.

Also, your suggestion does not address the sub-select case, which I think is
much bigger, performance wise, and more efficient than MySQL's cache.

This whole discussion could be moot, and this could be developed as an
extension, if there were a function API that could return sets of whole rows.



Greg Copeland wrote:
 
 I previously replied to you vaguely describing a way you could implement
 this by using a combination of client side caching and database tables
 and triggers to allow you to determine if your cache is still valid.
 Someone came right behind me, Tom maybe??, and indicated that the
 proper/ideal way to do this would be to using postgres' asynchronous
 database notification mechanisms (listen/notify I believe were the
 semantics) to alert your application that your cache has become
 invalid.  Basically, a couple of triggers and the use of the list/notify
 model, and you should be all set.
 
 Done properly, a client side cache which is asynchronously notified by
 the database when it's contents become invalid should be faster than
 relying on MySQL's database caching scheme.  Basically, a strong client
 side cache is going to prevent your database from even having to return
 a cached result set while a database side cache is going to always
 return a result set.  Of course, one of the extra cool things you can do
 is to cache a gzip'd copy of the data contents which would further act
 as an optimization preventing the client or web server (in case they are
 different) from having to recompress every result set.
 
 In the long run, again, if properly done, you should be able to beat
 MySQL's implementation without too extra much effort.  Why?  Because a
 client side cache can be much smarter in the way that it uses it's
 cached contents much in the same way an application is able to better
 cache it's data then what the file system is able to do.  This is why an
 client side cache should be preferred over that of a database result set
 cache.
 
 Greg
 
 References:
 http://www.postgresql.org/idocs/index.php?sql-notify.html
 http://www.postgresql.org/idocs/index.php?sql-listen.html
 
 On Sat, 2002-03-16 at 08:01, mlw wrote:
  I traded a couple emails with a guy using one of my open source projects. To
  make a long story short, he is going to the new version of MySQL for his
  website because of the new caching feature. He is convinced that it will speed
  up his web site, and he is probably right.
 
  On a web site, a few specific queries get executed, unchanged, repeatedly.
  Think about an ecommerce site, most of the time it is just a handful of basic
  queries. These basic queries are usually against pretty large product tables. A
  caching mechanism would make these queries pretty light weight.
 
  The arguments against caching:
 
  It is an application issue
  This is completely wrong. Caching can not be done against a database without
  knowledge of the database, i.e. when the data changes.
 
  If it is mostly static data, why not just make it a static page?
  Because a static page is a maintenance nightmare. One uses a database in a web
  site to allow content to be changed and upgraded dynamically and with a minimum
  of work.
 
  It isn't very useful
  I disagree completely. A cache of most frequently used queries, or specific
  ones, could make for REALLY good performance in some very specific, but very
  common, applications. Any system that has a hierarchical drill down interface
  to a data set, ecommerce, libraries, document management systems, etc. will
  greatly benefit from a query cache.
 
  I was thinking that it could be implemented as a keyword or comment in a query.
  Such as:
 
  select * from table where column = 'foo' cacheable
  or
  select * from table where column = 'bar' /* cacheable */
 
  Either way, it would speed up a lot of common application types. It would even
  be very cool if you could just cache the results of sub queries, such as:
 
  select * from (select * from table where col1 = 'foo' cacheable) as subset
  where subset.col2 = 'bar' ;
 
  Which would mean that the subquery gets cached, but the greater select need not
  be. The cache could be like a global temp table. Perhaps the user could even
  name the cache entry:
 
  select * from table where column = 'foo' cache on foo
 
  Where one could also do:
 
  select * from cache_foo
 
  Using a keyword is probably a better idea, it can be picked up by the parser
  and instruct PostgreSQL to use the cache, otherwise there will be no additional
  overhead.
 
  Having caching within PostgreSQL will be good for data integrity. Application
  caches can't tell when an update/delete/insert happens, they often have to use
  a time-out mechanism.
 

Re: [HACKERS] Again, sorry, caching.

2002-03-16 Thread Greg Copeland

On Sat, 2002-03-16 at 08:01, mlw wrote:
[snip]

 If it is mostly static data, why not just make it a static page?
 Because a static page is a maintenance nightmare. One uses a database in a web
 site to allow content to be changed and upgraded dynamically and with a minimum
 of work.
 


Oh ya, I forgot that reply to that part.  I think you are forgetting
that you can use a database to generate a static page.  That is, only
regenerate the static page when the data within the database changes. 
Again, this is another example of efficient application caching.  If you
have an application which listens for your cache invalidation event, you
can then recreate your static page.  Again, database result set caching
is not required.  And again, then should be significantly faster than
MySQL's result set caching.  Also worth noting that you could then gzip
your static page (keeping both static pages -- compressed and
uncompressed) resulting in yet another optimization for most web servers
and browsers.

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Again, sorry, caching.

2002-03-16 Thread Greg Copeland

On Sat, 2002-03-16 at 08:36, mlw wrote:
 Triggers and asynchronous notification are not substitutes for real hard ACID
 complient caching. The way you suggest implies only one access model. Take the
 notion of a library, they have both web and application access. These should
 both be able to use the cache.
 

Well, obviously, you'd need to re-implement the client side cache in
each implementation of the client.  That is a down side and I certainly
won't argue that.  As for the no substitute comment, I'm guess I'll
plead ignorance because I'm not sure what I'm missing here.  What am I
missing that would not be properly covered by that model?

 Also, your suggestion does not address the sub-select case, which I think is
 much bigger, performance wise, and more efficient than MySQL's cache.

I'm really not sure what you mean by that.  Doesn't address it but is
more efficient?  Maybe it's because I've not had my morning coffee
yet... ;)

 
 This whole discussion could be moot, and this could be developed as an
 extension, if there were a function API that could return sets of whole rows.
 

Maybe...but you did ask for feedback.  :)

Greg






signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Again, sorry, caching.

2002-03-16 Thread Stephan Szabo

 I was thinking that it could be implemented as a keyword or comment in a query.
 Such as:

 select * from table where column = 'foo' cacheable
 or
 select * from table where column = 'bar' /* cacheable */


 Having caching within PostgreSQL will be good for data integrity. Application
 caches can't tell when an update/delete/insert happens, they often have to use
 a time-out mechanism.

 OK, let me have it, tell me how terrible an idea this is. tell me how wrong I
 am.

I don't think it's a bad idea, but a cache that takes a query string (or
subquery string) and looks for a match based on that is flawed without
special consideration to non-cacheable functions and constructs
(CURRENT_USER, things that depend on timezone, things that depend on
datestyle). We'd also need to work out an appropriate mechanism to deal
with cache invalidation and adding things to the cache.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly