Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Gavin Sherry
zOn Thu, 28 Jul 2005, Matthew Schumacher wrote: Gavin Sherry wrote: I had a look at your data -- thanks. I have a question though: put_token() is invoked 120596 times in your benchmark... for 616 messages. That's nearly 200 queries (not even counting the 1-8 (??) inside the function

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Dennis Bjorklund
On Wed, 27 Jul 2005, Matthew Schumacher wrote: Then they do this to insert the token: INSERT INTO bayes_token ( id, token, spam_count, ham_count, atime ) VALUES ( ?, ?, ?, ?, ? ) ON DUPLICATE KEY UPDATE spam_count = GREATEST(spam_count + ?, 0),

Re: [PERFORM] Performance problems testing with Spamassassin

2005-07-29 Thread Luke Lonergan
work_mem = 131072 # min 64, size in KB shared_buffers = 16000 # min 16, at least max_connections*2, 8KB each checkpoint_segments = 128 # in logfile segments, min 1, 16MB each effective_cache_size = 75 # typically 8KB each fsync=false # turns

Re: [PERFORM] [PATCHES] COPY FROM performance improvements

2005-07-29 Thread Bruce Momjian
Luke Lonergan wrote: Mark, On 7/28/05 4:43 PM, Mark Wong [EMAIL PROTECTED] wrote: Are there any recommendations for Qlogic controllers on Linux, scsi or fiber channel? I might be able to my hands on some. I have pci-x slots for AMD, Itanium, or POWER5 if the architecture makes a

Re: [PERFORM] Performance problems testing with Spamassassin

2005-07-29 Thread Alvaro Herrera
On Fri, Jul 29, 2005 at 03:01:07AM -0400, Luke Lonergan wrote: I guess we see the real culprit here. Anyone surprised it's the WAL? So what? Are you planning to suggest people to turn fsync=false? I just had a person lose 3 days of data on some tables because of that, even when checkpoints

Re: [PERFORM] Performance problems testing with Spamassassin

2005-07-29 Thread Tom Lane
Luke Lonergan [EMAIL PROTECTED] writes: I guess we see the real culprit here. Anyone surprised it's the WAL? You have not proved that at all. I haven't had time to look at Matthew's problem, but someone upthread implied that it was doing a separate transaction for each word. If so, collapsing

Re: [PERFORM] BUG #1797: Problem using Limit in a function, seqscan

2005-07-29 Thread Bruno Wolff III
On Fri, Jul 29, 2005 at 13:52:45 +0100, Magno Leite [EMAIL PROTECTED] wrote: Description:Problem using Limit in a function, seqscan I looked for about this problem in BUG REPORT but I can't find. This is my problem, when I try to use limit in a function, the Postgre doesn't use my

[PERFORM] Performance problems on 4/8way Opteron (dualcore) HP DL585

2005-07-29 Thread Dirk Lutzebäck
Hi, does anybody have expierence with this machine (4x 875 dual core Opteron CPUs)? We run RHEL 3.0, 32bit and under high load it is a drag. We mostly run memory demanding queries. Context switches are pretty much around 20.000 on the average, no cs spikes when we run many processes in

Re: [PERFORM] Performance problems testing with Spamassassin

2005-07-29 Thread Josh Berkus
Luke, work_mem = 131072 # min 64, size in KB Incidentally, this is much too high for an OLTP application, although I don't think this would have affected the test. shared_buffers = 16000 # min 16, at least max_connections*2, 8KB each checkpoint_segments = 128 #

Re: [PERFORM] Performance problems testing with Spamassassin

2005-07-29 Thread Luke Lonergan
Tom, On 7/29/05 7:12 AM, Tom Lane [EMAIL PROTECTED] wrote: Luke Lonergan [EMAIL PROTECTED] writes: I guess we see the real culprit here. Anyone surprised it's the WAL? You have not proved that at all. As Alvaro pointed out, fsync has impact on more than WAL, so good point. Interesting

Re: [PERFORM] Performance problems testing with Spamassassin

2005-07-29 Thread Luke Lonergan
Alvaro, On 7/29/05 6:23 AM, Alvaro Herrera [EMAIL PROTECTED] wrote: On Fri, Jul 29, 2005 at 03:01:07AM -0400, Luke Lonergan wrote: I guess we see the real culprit here. Anyone surprised it's the WAL? So what? Are you planning to suggest people to turn fsync=false? That's not the

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Josh Berkus
Dennis,       EXCEPTION WHEN unique_violation THEN I seem to remember that catching an exception in a PL/pgSQL procedure was a large performance cost. It'd be better to do UPDATE ... IF NOT FOUND. -- --Josh Josh Berkus Aglio Database Solutions San Francisco

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Matthew Schumacher
Andrew McMillan wrote: On Thu, 2005-07-28 at 16:13 -0800, Matthew Schumacher wrote: Ok, I finally got some test data together so that others can test without installing SA. The schema and test dataset is over at http://www.aptalaska.net/~matt.s/bayes/bayesBenchmark.tar.gz I have a pretty fast

Re: [PERFORM] Performance problems on 4/8way Opteron (dualcore) HP DL585

2005-07-29 Thread Josh Berkus
Dirk, does anybody have expierence with this machine (4x 875 dual core Opteron CPUs)? Nope. I suspect that you may be the first person to report in on dual-cores. There may be special compile issues with dual-cores that we've not yet encountered. We run RHEL 3.0, 32bit and under high

Re: [PERFORM] [PATCHES] COPY FROM performance improvements

2005-07-29 Thread Luke Lonergan
Bruce, On 7/29/05 5:37 AM, Bruce Momjian pgman@candle.pha.pa.us wrote: Where is the most recent version of the COPY patch? My direct e-mails aren't getting to you, they are trapped in a spam filter on your end, so you didn't get my e-mail with the patch! I've attached it here, sorry to the

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread John Arbash Meinel
Josh Berkus wrote: Dennis, EXCEPTION WHEN unique_violation THEN I seem to remember that catching an exception in a PL/pgSQL procedure was a large performance cost. It'd be better to do UPDATE ... IF NOT FOUND. Actually, he was doing an implicit UPDATE IF NOT FOUND in

Re: [PERFORM] Performance problems testing with Spamassassin

2005-07-29 Thread Luke Lonergan
Tom, On 7/27/05 11:19 PM, Tom Lane [EMAIL PROTECTED] wrote: Matthew Schumacher [EMAIL PROTECTED] writes: After playing with various indexes and what not I simply am unable to make this procedure perform any better. Perhaps someone on the list can spot the bottleneck and reveal why this

Re: [PERFORM] Performance problems on 4/8way Opteron (dualcore) HP

2005-07-29 Thread Jeffrey W. Baker
On Fri, 2005-07-29 at 10:46 -0700, Josh Berkus wrote: Dirk, does anybody have expierence with this machine (4x 875 dual core Opteron CPUs)? I'm using dual 275s without problems. Nope. I suspect that you may be the first person to report in on dual-cores. There may be special compile

Re: [PERFORM] Performance problems on 4/8way Opteron (dualcore)

2005-07-29 Thread J. Andrew Rogers
On 7/29/05 10:46 AM, Josh Berkus josh@agliodbs.com wrote: does anybody have expierence with this machine (4x 875 dual core Opteron CPUs)? Nope. I suspect that you may be the first person to report in on dual-cores. There may be special compile issues with dual-cores that we've not yet

Re: [PERFORM] Performance problems testing with Spamassassin

2005-07-29 Thread Matthew Schumacher
Ok, Here is something new, when I take my data.sql file and add a begin and commit at the top and bottom, the benchmark is a LOT slower? My understanding is that it should be much faster because fsync isn't called until the commit instead of on every sql command. I must be missing something

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread PFC
Also, this test goes a bit faster with sync turned off, if mysql isn't using sync that would be why it's so much faster. Anyone know what the default for mysql is? For InnoDB I think it's like Postgres (only slower) ; for MyISAM it's no fsync, no transactions, no crash tolerance of any

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Andrew McMillan
On Fri, 2005-07-29 at 09:37 -0800, Matthew Schumacher wrote: On my laptop this takes: real1m33.758s user0m4.285s sys 0m1.181s One interesting effect is the data in bayes_vars has a huge number of updates and needs vacuum _frequently_. After the run a vacuum full

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Tom Lane
Andrew McMillan [EMAIL PROTECTED] writes: On Fri, 2005-07-29 at 09:37 -0800, Matthew Schumacher wrote: How often should this table be vacuumed, every 5 minutes? I would be tempted to vacuum after each e-mail, in this case. Perhaps the bulk of the transient states should be done in a temp

Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-29 Thread Matthew Schumacher
Ok, here is where I'm at, I reduced the proc down to this: CREATE FUNCTION update_token (_id INTEGER, _token BYTEA, _spam_count INTEGER, _ham_count INTEGER, _atime INTEGER)