Re: [PERFORM] Postgres vs. DSpam

2004-11-29 Thread Jim C. Nasby
FWIW, those queries won't be able to use an index. A better WHERE clause
would be:

AND last_hit  CURRENT_DATE - 60

On Fri, Nov 26, 2004 at 02:37:12PM +1300, Andrew McMillan wrote:
 On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote:
  Hi,
  
  I have installed the dspam filter
  (http://www.nuclearelephant.com/projects/dspam) on our mail server
  (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users
  with a quite low traffic of 4000 messages/day. So it's a quite common
  platform/environment, nothing spectacular.
 
 I am using DSpam with PostgreSQL here.  I have a daily job that cleans
 the DSpam database up, as follows:
 
 DELETE FROM dspam_token_data
   WHERE (innocent_hits*2) + spam_hits  5
   AND CURRENT_DATE - last_hit  60;
 
 DELETE FROM dspam_token_data
   WHERE innocent_hits = 1
   AND CURRENT_DATE - last_hit  30;
 
 DELETE FROM dspam_token_data
   WHERE CURRENT_DATE - last_hit  180;
 
 DELETE FROM dspam_signature_data
   WHERE CURRENT_DATE - created_on  14;
 
 VACUUM dspam_token_data;
 
 VACUUM dspam_signature_data;
 
 
 
 I also occasionally do a VACUUM FULL ANALYZE; on the database as well.
 
 
 In all honesty though, I think that MySQL is better suited to DSpam than
 PostgreSQL is.
 
 
  Please, could anyone explain me this difference?
  Is Postgres that bad?
  Is MySQL that good?
  Am I the only one to have observed this behavior?
 
 I believe that what DSpam does that is not well-catered for in the way
 PostgreSQL operates, is that it does very frequent updates to rows in
 (eventually) quite large tables.  In PostgreSQL the UPDATE will result
 internally in a new record being written, with the old record being
 marked as deleted.  That old record won't be re-used until after a
 VACUUM has run, and this means that the on-disk tables will have a lot
 of dead rows in them quite quickly.
 
 The reason that PostgreSQL operates this way, is a direct result of the
 way transactional support is implemented, and it may well change in a
 version or two.  It's got better over the last few versions, with things
 like pg_autovacuum, but that approach still doesn't suit some types of
 database updating.
 
 Cheers,
   Andrew.
 -
 Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
 WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
 DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!!
 -
 



-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Postgres vs. DSpam

2004-11-25 Thread Andrew McMillan
On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote:
 Hi,
 
 I have installed the dspam filter
 (http://www.nuclearelephant.com/projects/dspam) on our mail server
 (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users
 with a quite low traffic of 4000 messages/day. So it's a quite common
 platform/environment, nothing spectacular.

I am using DSpam with PostgreSQL here.  I have a daily job that cleans
the DSpam database up, as follows:

DELETE FROM dspam_token_data
  WHERE (innocent_hits*2) + spam_hits  5
  AND CURRENT_DATE - last_hit  60;

DELETE FROM dspam_token_data
  WHERE innocent_hits = 1
  AND CURRENT_DATE - last_hit  30;

DELETE FROM dspam_token_data
  WHERE CURRENT_DATE - last_hit  180;

DELETE FROM dspam_signature_data
  WHERE CURRENT_DATE - created_on  14;

VACUUM dspam_token_data;

VACUUM dspam_signature_data;



I also occasionally do a VACUUM FULL ANALYZE; on the database as well.


In all honesty though, I think that MySQL is better suited to DSpam than
PostgreSQL is.


 Please, could anyone explain me this difference?
 Is Postgres that bad?
 Is MySQL that good?
 Am I the only one to have observed this behavior?

I believe that what DSpam does that is not well-catered for in the way
PostgreSQL operates, is that it does very frequent updates to rows in
(eventually) quite large tables.  In PostgreSQL the UPDATE will result
internally in a new record being written, with the old record being
marked as deleted.  That old record won't be re-used until after a
VACUUM has run, and this means that the on-disk tables will have a lot
of dead rows in them quite quickly.

The reason that PostgreSQL operates this way, is a direct result of the
way transactional support is implemented, and it may well change in a
version or two.  It's got better over the last few versions, with things
like pg_autovacuum, but that approach still doesn't suit some types of
database updating.

Cheers,
Andrew.
-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
   These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!!
-



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Postgres vs. DSpam

2004-11-25 Thread Neil Conway
On Fri, 2004-11-26 at 14:37 +1300, Andrew McMillan wrote:
 In PostgreSQL the UPDATE will result
 internally in a new record being written, with the old record being
 marked as deleted.  That old record won't be re-used until after a
 VACUUM has run, and this means that the on-disk tables will have a lot
 of dead rows in them quite quickly.

Not necessarily: yes, you need a VACUUM to begin reusing the space
consumed by expired tuples, but that does not mean tables will have a
lot of dead rows in them quite quickly. VACUUM does not block
concurrent database activity, so you can run it as frequently as you'd
like (and as your database workload requires). There is a tradeoff
between the space consumed by expired tuple versions and the I/O
required to do a VACUUM -- it's up to the PG admin to decide what the
right balance for their database is (pg_autovacuum et al. can help make
this decision).

 The reason that PostgreSQL operates this way, is a direct result of the
 way transactional support is implemented, and it may well change in a
 version or two.

I doubt it.

-Neil



---(end of broadcast)---
TIP 8: explain analyze is your friend