Re: [PERFORM] memcached and PostgreSQL

2004-11-24 Thread Jim C. Nasby
If instead of a select you do a select for update I think this would be
transaction safe. Nothing would be able to modify the data in the
database between when you do the SELECT and when you commit. If the
transaction fails the value in memcached will be correct.

Also, it's not clear if you're doing an update here or not... If you're
doing an update then this wouldn't work. You'd want to do your update,
then re-insert the value into memcached outside of the update
transaction.

On Tue, Nov 23, 2004 at 02:20:34PM -0800, Sean Chittenden wrote:
 My point was that there are two failure cases --- one where the cache 
 is
 slightly out of date compared to the db server --- these are cases 
 where
 the cache update is slightly before/after the commit.
 
 I was thinking about this and ways to minimize this even further.  Have 
 memcache clients add data and have a policy to have the database only 
 delete data.  This sets the database up as the bottleneck again, but 
 then you have a degree of transactionality that couldn't be previously 
 achieved with the database issuing replace commands.  For example:
 
 1) client checks the cache for data and gets a cache lookup failure
 2) client beings transaction
 3) client SELECTs data from the database
 4) client adds the key to the cache
 5) client commits transaction
 
 This assumes that the client won't rollback or have a transaction 
 failure.  Again, in 50M transactions, I doubt one of them would fail 
 (sure, it's possible, but that's a symptom of bigger problems: 
 memcached isn't an RDBMS).
 
 The update case being:
 
 1) client begins transaction
 2) client updates data
 3) database deletes record from memcache
 4) client commits transaction
 5) client adds data to memcache
 
 The second is
 where the cache update happens and the commit later fails, or the 
 commit
 happens and the cache update never happens.
 
 Having pgmemcache delete, not replace data addresses this second issue. 
  -sc
 
 -- 
 Sean Chittenden
 
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] memcached and PostgreSQL

2004-11-23 Thread Sean Chittenden
My point was that there are two failure cases --- one where the cache 
is
slightly out of date compared to the db server --- these are cases 
where
the cache update is slightly before/after the commit.
I was thinking about this and ways to minimize this even further.  Have 
memcache clients add data and have a policy to have the database only 
delete data.  This sets the database up as the bottleneck again, but 
then you have a degree of transactionality that couldn't be previously 
achieved with the database issuing replace commands.  For example:

1) client checks the cache for data and gets a cache lookup failure
2) client beings transaction
3) client SELECTs data from the database
4) client adds the key to the cache
5) client commits transaction
This assumes that the client won't rollback or have a transaction 
failure.  Again, in 50M transactions, I doubt one of them would fail 
(sure, it's possible, but that's a symptom of bigger problems: 
memcached isn't an RDBMS).

The update case being:
1) client begins transaction
2) client updates data
3) database deletes record from memcache
4) client commits transaction
5) client adds data to memcache
The second is
where the cache update happens and the commit later fails, or the 
commit
happens and the cache update never happens.
Having pgmemcache delete, not replace data addresses this second issue. 
 -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] memcached and PostgreSQL

2004-11-22 Thread Patrick B Kelly
On Nov 21, 2004, at 11:55 PM, Sean Chittenden wrote:

This is similar to sending email in a trigger or on commit where you 
can't be certain you send email always
and only on a commit.
While this is certainly a possibility, it's definitely closer to the 
exception and not the normal instance.
While an exception, this is a very real possibility in day to day 
operations. The absence of any feedback or balancing mechanism between 
the database and cache makes it impossible to know that they are in 
sync and even a small error percentage multiplied over time will lead 
to an ever increasing likelihood of error.

More dangerous is that this discrepancy will NOT always be apparent 
because without active verification of the correctness of the cache, we 
will not know about any errors unless the error grows to an obvious 
point. The errors may cause material damage long before they become 
obvious. This is a common failure pattern with caches.


Patrick B. Kelly
--
  http://patrickbkelly.org
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] memcached and PostgreSQL

2004-11-22 Thread Pierre-Frdric Caillaud

While an exception, this is a very real possibility in day to day  
operations. The absence of any feedback or balancing mechanism between  
the database and cache makes it impossible to know that they are in sync  
and even a small error percentage multiplied over time will lead to an  
ever increasing likelihood of error.
	Sure, but there are applications where it does not matter, and these  
applications are othen loading the database... think about displaying  
forum posts, products list in a web store, and especially category trees,  
top N queries... for all these, it does not matter if the data is a bit  
stale. For instance, a very popular forum will be cached, which is very  
important. In this case I think it is acceptable if a new post does not  
appear instantly.

	Of course, when inserting or updating data in the database, the primary  
keys and other important data should be fetched from the database and not  
the cache, which supposes a bit of application logic (for instance, in a  
forum, the display page should query the cache, but the post message  
page should query the database directly).

	Memcache can also save the database from update-heavy tasks like user  
session management. In that case sessions can be stored entirely in memory.

ON COMMIT triggers would be very useful.
More dangerous is that this discrepancy will NOT always be apparent  
because without active verification of the correctness of the cache, we  
will not know about any errors unless the error grows to an obvious  
point.

The errors may cause material damage long before they become obvious.  
This is a common failure pattern with caches.
	This is why it would be dangerous to fetch referential integrity data  
from the cache... this fits your banking example for instance.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] memcached and PostgreSQL

2004-11-22 Thread Bruce Momjian
Pierre-Frédéric Caillaud wrote:
 
  While an exception, this is a very real possibility in day to day  
  operations. The absence of any feedback or balancing mechanism between  
  the database and cache makes it impossible to know that they are in sync  
  and even a small error percentage multiplied over time will lead to an  
  ever increasing likelihood of error.
 
   Sure, but there are applications where it does not matter, and these  
 applications are othen loading the database... think about displaying  
 forum posts, products list in a web store, and especially category trees,  
 top N queries... for all these, it does not matter if the data is a bit  
 stale. For instance, a very popular forum will be cached, which is very  
 important. In this case I think it is acceptable if a new post does not  
 appear instantly.

My point was that there are two failure cases --- one where the cache is
slightly out of date compared to the db server --- these are cases where
the cache update is slightly before/after the commit.  The second is
where the cache update happens and the commit later fails, or the commit
happens and the cache update never happens.  In these cases the cache is
out of date for the amount of time you cache the data and not expire it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] memcached and PostgreSQL

2004-11-21 Thread Bruce Momjian
Josh Berkus wrote:
 Michael,
 
  Still, it seems like a convenient way to maintain cache coherency,
  assuming that your application doesn't already have a clean way to do
  that.
 
 Precisely.The big problem with memory caching is the cache getting out of 
 sync with the database.   Updating the cache through database triggers helps 
 ameliorate that.
 
 However, our inability to pass messages with NOTIFY somewhat limits the the 
 utility of this solution   Sean wants on commit triggers, but there's some 
 major issues to work out with that.   Passing messages with NOTIFY would be 
 easier and almost as good.

The big concern I have about memcache is that because it controls
storage external to the database there is no way to guarantee the cache
is consistent with the database.  This is similar to sending email in a
trigger or on commit where you can't be certain you send email always
and only on a commit.

In the database, we mark everything we do with a transaction id and mark
the transaction id as committed in on operation.  I see no way to do
that with memcache.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] memcached and PostgreSQL

2004-11-21 Thread Josh Berkus
Bruce,

 The big concern I have about memcache is that because it controls
 storage external to the database there is no way to guarantee the cache
 is consistent with the database.  This is similar to sending email in a
 trigger or on commit where you can't be certain you send email always
 and only on a commit.

Well, some things ... ON COMMIT triggers, or messages with NOTIFY, would 
improve the accuracy by cutting down on cached aborted transactions.  

However, caching is of necessity imperfect.   Caching is a trade-off; greater 
access speed vs. perfect consistency (and any durability).There are cases 
where the access speed is more important than the consistency (or the 
durability).   The answer is to use memcached judiciously and be prepared to 
account for minor inconsistencies.

For that matter, as with other forms of cumulative asynchronous materialized 
view, it would be advisable to nightly re-build copies of data stored in 
memcached from scratch during system slow time, assuming that the data in 
memcached corresponds to a real table.  Where memcached does not correspond 
to a real table (session keys, for example), it is not a concern at all.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] memcached and PostgreSQL

2004-11-21 Thread Sean Chittenden
The big concern I have about memcache is that because it controls
storage external to the database there is no way to guarantee the cache
is consistent with the database.
I've found that letting applications add data to memcache and then 
letting the database replace or delete keys seems to be the best 
approach to minimize exactly this issue.  Having two clients update the 
cache is risky.  Using triggers or using NOTIFY + tailing logs makes 
this much more bullet proof.

This is similar to sending email in a trigger or on commit where you 
can't be certain you send email always
and only on a commit.
While this is certainly a possibility, it's definitely closer to the 
exception and not the normal instance.

In the database, we mark everything we do with a transaction id and 
mark
the transaction id as committed in on operation.  I see no way to do
that with memcache.
Correct.  With an ON COMMIT trigger, it'll be easier to have a 100% 
accurate cache.  That said, memcache does exist out side of the 
database so it's theoretically impossible to guarantee that the two are 
100% in sync.  pgmemcache goes a long way towards facilitating that the 
cache is in sync with the database, but it certainly doesn't guarantee 
it's in sync.  That being said, I haven't had any instances of it not 
being in sync since using pgmemcache (I'm quite proud of this, to be 
honest *grin*).  For critical operations such as financial 
transactions, however, I advise going to the database unless you're 
willing to swallow the financial cost of cache discrepancies.

-sc
--
Sean Chittenden
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] memcached and PostgreSQL

2004-11-18 Thread Sean Chittenden
So What does memcached offer pgsql users? It would still seem to offer
the benefit of a multi-machined cache.
Ack, I totally missed this thread.  Sorry for jumping in late.
Basically, memcached and pgmemcache offer a more technically correct 
way of implementing query caching.  MySQL's query caching is a 
disaster, IMHO.  memcached alleviates this load from the database and 
puts it elsewhere in a more optimized form.  The problem with memcached 
by itself is that you're relying on the application to invalidate the 
cache.  How many different places have to be kept in sync?  Using 
memcached, in its current form, makes relying on the application to be 
developed correctly with centralized libraries and database access 
routines.  Bah, that's a cluster f#$@ waiting to happen.

pgmemcache fixes that though so that you don't have to worry about 
invalidating the cache in every application/routine.  Instead you just 
centralize that logic in the database and automatically invalidate via 
triggers.  It's working out very well for me.

I'd be interested in success stories, fwiw.  In the next week or so 
I'll probably stick this on pgfoundry and build a proper make/release 
structure.  -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] memcached and PostgreSQL

2004-11-17 Thread Mike Rylander
On 17 Nov 2004 03:08:20 -0500, Greg Stark [EMAIL PROTECTED] wrote:
 Josh Berkus [EMAIL PROTECTED] writes:
 
  So memcached becomes a very good place to stick data that's read often but 
  not
  updated often, or alternately data that changes often but is disposable.   
  An
  example of the former is a user+ACL list; and example of the latter is web
  session information ... or simple materialized views.
 
 I would like very much to use something like memcached for a materialized view
 I have. The problem is that I have to join it against other tables.
 
 I've thought about providing a SRF in postgres to read records out of
 memcached but I'm unclear it would it really help at all.
 
 Has anyone tried anything like this?

I haven't tried it yet, but I plan too.  An intersting case might be
to use plperlu to interface with memcached and store hashes in the
cache via some external process, like a CGI script.  Then just define
a TYPE for the perl SRF to return, and store the data as an array of
hashes with keys matching the TYPE.

A (perhaps useless) example could then be something like:

CREATE TYPE user_info AS ( sessionid TEXT,  userid INT, lastaccess
TIMESTAMP, lastrequest TEXT);

CREATE FUNCTION get_user_info_by_session ( TEXT) RETURNS SETOF user_info AS $$
  use Cache::Memcached;

  my $session = shift;

  my $c = $_SHARED{memcached} || Cache::Memcached-new( {servers =
'127.0.0.1:'} );

  my $user_info = $m-get('web_access_list');

  # $user_info looks like
  # [ {userid = 5, lastrequest = 'http://...', lastaccess = localtime(),
  #sessionid = '123456789'}, { ...} ]
  # and is stored by a CGI.

  @info = grep {$$_{sessionid} eq $session} @$user_info;

  return [EMAIL PROTECTED];
$$ LANGUAGE 'plperlu';

SELECT u.username, f.lastrequest FROM users u,
get_user_info_by_session('123456789') WHERE f.userid = u.userid;


Any thoughts? 

 
 --
 greg
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faqs/FAQ.html
 


-- 
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] memcached and PostgreSQL

2004-11-17 Thread Darcy Buskermolen
On November 16, 2004 08:00 pm, Michael Adler wrote:
 http://pugs.postgresql.org/sfpug/archives/21.html

 I noticed that some of you left coasters were talking about memcached
 and pgsql. I'm curious to know what was discussed.

 In reading about memcached, it seems that many people are using it to
 circumvent the scalability problems of MySQL (lack of MVCC).

 from their site:

 snip
 Shouldn't the database do this?

 Regardless of what database you use (MS-SQL, Oracle, Postgres,
 MysQL-InnoDB, etc..), there's a lot of overhead in implementing ACID
 properties in a RDBMS, especially when disks are involved, which means
 queries are going to block. For databases that aren't ACID-compliant
 (like MySQL-MyISAM), that overhead doesn't exist, but reading threads
 block on the writing threads. memcached never blocks.
 /snip

 So What does memcached offer pgsql users? It would still seem to offer
 the benefit of a multi-machined cache.

Have a look at the pdf presentation found on the following site:

http://people.freebsd.org/~seanc/pgmemcache/



 -Mike

 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] memcached and PostgreSQL

2004-11-17 Thread Michael Adler
On Wed, Nov 17, 2004 at 09:13:09AM -0800, Darcy Buskermolen wrote:
 On November 16, 2004 08:00 pm, Michael Adler wrote:
  http://pugs.postgresql.org/sfpug/archives/21.html
 
  I noticed that some of you left coasters were talking about memcached
  and pgsql. I'm curious to know what was discussed.
 
 Have a look at the pdf presentation found on the following site:
 
 http://people.freebsd.org/~seanc/pgmemcache/

Thanks for that.

That presentation was rather broad and the API seems rather general
purpose, but I wonder why you would really want access the cache by
way of the DB? If one major point of memcache is to allocate RAM to a
low-overhead server instead of to the RDBMS's disk cache, why would
you add the overhead of the RDBMS to the process?  (this is a bit of
straw man, but just trying to flesh-out the pros and cons)

Still, it seems like a convenient way to maintain cache coherency,
assuming that your application doesn't already have a clean way to do
that.

(just my uninformed opinion, though...)

-Mike

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] memcached and PostgreSQL

2004-11-17 Thread Josh Berkus
Michael,

 Still, it seems like a convenient way to maintain cache coherency,
 assuming that your application doesn't already have a clean way to do
 that.

Precisely.The big problem with memory caching is the cache getting out of 
sync with the database.   Updating the cache through database triggers helps 
ameliorate that.

However, our inability to pass messages with NOTIFY somewhat limits the the 
utility of this solution   Sean wants on commit triggers, but there's some 
major issues to work out with that.   Passing messages with NOTIFY would be 
easier and almost as good.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] memcached and PostgreSQL

2004-11-16 Thread Josh Berkus
Michael,

 So What does memcached offer pgsql users? It would still seem to offer
 the benefit of a multi-machined cache.

Yes, and a very, very fast one too ... like, 120,000 operations per second.   
PostgreSQL can't match that because of the overhead of authentication, 
security, transaction visibility checking, etc.   

So memcached becomes a very good place to stick data that's read often but not 
updated often, or alternately data that changes often but is disposable.   An 
example of the former is a user+ACL list; and example of the latter is web 
session information ... or simple materialized views.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] memcached and PostgreSQL

2004-11-16 Thread Troels Arvin
On Tue, 16 Nov 2004 21:47:54 -0800, Josh Berkus wrote:

 So memcached becomes a very good place to stick data that's read often but 
 not 
 updated often, or alternately data that changes often but is disposable.   An 
 example of the former is a user+ACL list; and example of the latter is web 
 session information ... or simple materialized views.

Has anyone tried at least two of

1. memcached
2. Tugela Cache (pretty much the same as memcached, I think)
3. Sharedance

In that case: Do you have any comparative remarks?


Links:

1: http://www.danga.com/memcached/

2: http://meta.wikimedia.org/wiki/Tugela_Cache
   http://cvs.sourceforge.net/viewcvs.py/wikipedia/tugelacache/

3: http://sharedance.pureftpd.org/

-- 
Greetings from Troels Arvin, Copenhagen, Denmark



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly