Re: [PERFORM] Postgres vs. DSpam
FWIW, those queries won't be able to use an index. A better WHERE clause would be: AND last_hit CURRENT_DATE - 60 On Fri, Nov 26, 2004 at 02:37:12PM +1300, Andrew McMillan wrote: On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote: Hi, I have installed the dspam filter (http://www.nuclearelephant.com/projects/dspam) on our mail server (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users with a quite low traffic of 4000 messages/day. So it's a quite common platform/environment, nothing spectacular. I am using DSpam with PostgreSQL here. I have a daily job that cleans the DSpam database up, as follows: DELETE FROM dspam_token_data WHERE (innocent_hits*2) + spam_hits 5 AND CURRENT_DATE - last_hit 60; DELETE FROM dspam_token_data WHERE innocent_hits = 1 AND CURRENT_DATE - last_hit 30; DELETE FROM dspam_token_data WHERE CURRENT_DATE - last_hit 180; DELETE FROM dspam_signature_data WHERE CURRENT_DATE - created_on 14; VACUUM dspam_token_data; VACUUM dspam_signature_data; I also occasionally do a VACUUM FULL ANALYZE; on the database as well. In all honesty though, I think that MySQL is better suited to DSpam than PostgreSQL is. Please, could anyone explain me this difference? Is Postgres that bad? Is MySQL that good? Am I the only one to have observed this behavior? I believe that what DSpam does that is not well-catered for in the way PostgreSQL operates, is that it does very frequent updates to rows in (eventually) quite large tables. In PostgreSQL the UPDATE will result internally in a new record being written, with the old record being marked as deleted. That old record won't be re-used until after a VACUUM has run, and this means that the on-disk tables will have a lot of dead rows in them quite quickly. The reason that PostgreSQL operates this way, is a direct result of the way transactional support is implemented, and it may well change in a version or two. It's got better over the last few versions, with things like pg_autovacuum, but that approach still doesn't suit some types of database updating. Cheers, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!! - -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Postgres vs. DSpam
On Wed, 2004-11-24 at 14:14 +0100, Evilio del Rio wrote: Hi, I have installed the dspam filter (http://www.nuclearelephant.com/projects/dspam) on our mail server (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users with a quite low traffic of 4000 messages/day. So it's a quite common platform/environment, nothing spectacular. I am using DSpam with PostgreSQL here. I have a daily job that cleans the DSpam database up, as follows: DELETE FROM dspam_token_data WHERE (innocent_hits*2) + spam_hits 5 AND CURRENT_DATE - last_hit 60; DELETE FROM dspam_token_data WHERE innocent_hits = 1 AND CURRENT_DATE - last_hit 30; DELETE FROM dspam_token_data WHERE CURRENT_DATE - last_hit 180; DELETE FROM dspam_signature_data WHERE CURRENT_DATE - created_on 14; VACUUM dspam_token_data; VACUUM dspam_signature_data; I also occasionally do a VACUUM FULL ANALYZE; on the database as well. In all honesty though, I think that MySQL is better suited to DSpam than PostgreSQL is. Please, could anyone explain me this difference? Is Postgres that bad? Is MySQL that good? Am I the only one to have observed this behavior? I believe that what DSpam does that is not well-catered for in the way PostgreSQL operates, is that it does very frequent updates to rows in (eventually) quite large tables. In PostgreSQL the UPDATE will result internally in a new record being written, with the old record being marked as deleted. That old record won't be re-used until after a VACUUM has run, and this means that the on-disk tables will have a lot of dead rows in them quite quickly. The reason that PostgreSQL operates this way, is a direct result of the way transactional support is implemented, and it may well change in a version or two. It's got better over the last few versions, with things like pg_autovacuum, but that approach still doesn't suit some types of database updating. Cheers, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 These PRESERVES should be FORCE-FED to PENTAGON OFFICIALS!! - signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Postgres vs. DSpam
On Fri, 2004-11-26 at 14:37 +1300, Andrew McMillan wrote: In PostgreSQL the UPDATE will result internally in a new record being written, with the old record being marked as deleted. That old record won't be re-used until after a VACUUM has run, and this means that the on-disk tables will have a lot of dead rows in them quite quickly. Not necessarily: yes, you need a VACUUM to begin reusing the space consumed by expired tuples, but that does not mean tables will have a lot of dead rows in them quite quickly. VACUUM does not block concurrent database activity, so you can run it as frequently as you'd like (and as your database workload requires). There is a tradeoff between the space consumed by expired tuple versions and the I/O required to do a VACUUM -- it's up to the PG admin to decide what the right balance for their database is (pg_autovacuum et al. can help make this decision). The reason that PostgreSQL operates this way, is a direct result of the way transactional support is implemented, and it may well change in a version or two. I doubt it. -Neil ---(end of broadcast)--- TIP 8: explain analyze is your friend