Re: [PERFORM] Performance problems on 4/8way Opteron (dualcore) HP

2005-07-31 Thread Dirk Lutzebäck
Anybody knows if RedHat is already supporting this patch on an 
enterprise version?


Regards,

Dirk



J. Andrew Rogers wrote:

On 7/29/05 10:46 AM, Josh Berkus josh@agliodbs.com wrote:


does anybody have expierence with this machine (4x 875 dual core Opteron
CPUs)?


Nope.   I suspect that you may be the first person to report in on
dual-cores.  There may be special compile issues with dual-cores that
we've not yet encountered.




There was recently a discussion of similar types of problems on a couple of
the supercomputing lists, regarding surprisingly substandard performance
from large dual-core opteron installations.

The problem as I remember it boiled down to the Linux kernel handling
memory/process management very badly on large dual core systems --
pathological NUMA behavior.  However, this problem has apparently been fixed
in Linux v2.6.12+, and using the more recent kernel on large dual core
systems generated *massive* performance improvements on these systems for
the individuals with this issue.  Using the patched kernel, one gets the
performance most people were expecting.

The v2.6.12+ kernels are a bit new, but they contain a very important
performance patch for systems like the one above.  It would definitely be
worth testing if possible.


J. Andrew Rogers





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance problems on 4/8way Opteron (dualcore) HP

2005-07-31 Thread Dirk Lutzebäck

Hi Jeff,

which box are you running precisely and which OS/kernel?

We need to run 32bit because we need failover to 32 bit XEON system 
(DL580). If this does not work out we probably need to switch to 64 bit 
(dump/restore) and run a nother 64bit failover box too.


Regards,

Dirk



Jeffrey W. Baker wrote:

On Fri, 2005-07-29 at 10:46 -0700, Josh Berkus wrote:


Dirk,



does anybody have expierence with this machine (4x 875 dual core Opteron
CPUs)?



I'm using dual 275s without problems.


Nope.   I suspect that you may be the first person to report in on 
dual-cores.  There may be special compile issues with dual-cores that 
we've not yet encountered.



Doubtful.  However you could see improvements using recent Linux kernel
code.  There have been some patches for optimizing scheduling and memory
allocations.

However, if you are running this machine in 32-bit mode, why did you
bother paying $14,000 for your CPUs?  You will get FAR better
performance in 64-bit mode.  64-bit mode will give you 30-50% better
performance on PostgreSQL loads, in my experience.  Also, if I remember
correctly, the 32-bit x86 kernel doesn't understand Opteron NUMA
topology, so you may be seeing poor memory allocation decisions.

-jwb


We run RHEL 3.0, 32bit and under high load it is a drag. We 
mostly run memory demanding queries. Context switches are pretty much

around 20.000 on the average, no cs spikes when we run many processes in
parallel. Actually we only see two processes in running state! When
there are only a few processes running context switches go much higher.
At the moment we are much slower that with a 4way XEON box (DL580).


Um, that was a bit incoherent.  Are you seeing a CS storm or aren't you?




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Performance problems on 4/8way Opteron (dualcore)

2005-07-31 Thread J. Andrew Rogers
On 7/30/05 12:57 AM, William Yu [EMAIL PROTECTED] wrote:
 I haven't investigated the 2.6.12+ kernel updates yet -- I probably will
 do our development servers first to give it a test.


The kernel updates make the NUMA code dual-core aware, which apparently
makes a big difference in some cases but not in others.  It makes some
sense, since multi-processor multi-core machines will have two different
types of non-locality instead of just one that need to be managed.  Prior to
the v2.6.12 patches, a dual-core dual-proc machine was viewed as a quad-proc
machine.

The closest thing to a supported v2.6.12 kernel that I know of is FC4, which
is not really supported in the enterprise sense of course.


J. Andrew Rogers



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-31 Thread John Arbash Meinel
John Arbash Meinel wrote:

Matthew Schumacher wrote:

  

All it's doing is trying the update before the insert to get around the
problem of not knowing which is needed.  With only 2-3 of the queries
implemented I'm already back to running about the same speed as the
original SA proc that is going to ship with SA 3.1.0.

All of the queries are using indexes so at this point I'm pretty
convinced that the biggest problem is the sheer number of queries
required to run this proc 200 times for each email (once for each token).

I don't see anything that could be done to make this much faster on the
postgres end, it's looking like the solution is going to involve cutting
down the number of queries some how.

One thing that is still very puzzling to me is why this runs so much
slower when I put the data.sql in a transaction.  Obviously transactions
are acting different when you call a proc a zillion times vs an insert
query.
 



Well, I played with adding a COMMIT;BEGIN; statement to your exact test
every 1000 lines. And this is what I got:
  

Just for reference, I also tested this on my old server, which is a dual
Celeron 450 with 256M ram. FC4 and Postgres 8.0.3
Unmodified:
real54m15.557s
user0m24.328s
sys 0m14.200s

With Transactions every 1000 selects, and vacuum every 5000:
real8m36.528s
user0m16.585s
sys 0m12.569s

With Transactions every 1000 selects, and vacuum every 1:
real7m50.748s
user0m16.183s
sys 0m12.489s

On this machine vacuum is more expensive, since it doesn't have as much ram.

Anyway, on this machine, I see approx 7x improvement. Which I think is
probably going to satisfy your spamassassin needs.
John
=:-

PS Looking forward to having a spamassassin that can utilize my
favorite db. Right now, I'm not using a db backend because it wasn't
worth setting up mysql.

Unmodified:
real17m53.587s
user0m6.204s
sys 0m3.556s

With BEGIN/COMMIT:
real1m53.466s
user0m5.203s
sys 0m3.211s

So I see the potential for improvement almost 10 fold by switching to
transactions. I played with the perl script (and re-implemented it in
python), and for the same data as the perl script, using COPY instead of
INSERT INTO means 5s instead of 33s.

I also played around with adding VACUUM ANALYZE every 10 COMMITS, which
brings the speed to:

real1m41.258s
user0m5.394s
sys 0m3.212s

And doing VACUUM ANALYZE every 5 COMMITS makes it:
real1m46.403s
user0m5.597s
sys 0m3.244s

I'm assuming the slowdown is because of the extra time spent vacuuming.
Overall performance might still be improving, since you wouldn't
actually be inserting all 100k rows at once.
  

...

This is all run on Ubuntu, with postgres 7.4.7, and a completely
unchanged postgresql.conf. (But the machine is a dual P4 2.4GHz, with
3GB of RAM).

John
=:-
  




signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-31 Thread Matthew Schumacher
Ok, here is the current plan.

Change the spamassassin API to pass a hash of tokens into the storage
module, pass the tokens to the proc as an array, start a transaction,
load the tokens into a temp table using copy, select the tokens distinct
into the token table for new tokens, update the token table for known
tokens, then commit.

This solves the following problems:

1.  Each email is a transaction instead of each token.
2.  The update statement is only called when we really need an update
which avoids all of those searches.
3.  The looping work is done inside the proc instead of perl calling a
method a zillion times per email.

I'm not sure how vacuuming will be done yet, if we vacuum once per email
that may be too often, so I may do that every 5 mins in cron.

schu

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-31 Thread Jim C. Nasby
On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote:
 Ok, here is the current plan.
 
 Change the spamassassin API to pass a hash of tokens into the storage
 module, pass the tokens to the proc as an array, start a transaction,
 load the tokens into a temp table using copy, select the tokens distinct
 into the token table for new tokens, update the token table for known
 tokens, then commit.

You might consider:
UPDATE tokens
FROM temp_table (this updates existing records)

INSERT INTO tokens
SELECT ...
FROM temp_table
WHERE NOT IN (SELECT ... FROM tokens)

This way you don't do an update to newly inserted tokens, which helps
keep vacuuming needs in check.

 This solves the following problems:
 
 1.  Each email is a transaction instead of each token.
 2.  The update statement is only called when we really need an update
 which avoids all of those searches.
 3.  The looping work is done inside the proc instead of perl calling a
 method a zillion times per email.
 
 I'm not sure how vacuuming will be done yet, if we vacuum once per email
 that may be too often, so I may do that every 5 mins in cron.

I would suggest leaving an option to have SA vacuum every n emails,
since some people may not want to mess with cron, etc. I suspect that
pg_autovacuum would be able to keep up with things pretty well, though.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-31 Thread Andreas Pflug

Jim C. Nasby wrote:

On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote:


Ok, here is the current plan.

Change the spamassassin API to pass a hash of tokens into the storage
module, pass the tokens to the proc as an array, start a transaction,
load the tokens into a temp table using copy, select the tokens distinct
into the token table for new tokens, update the token table for known
tokens, then commit.



You might consider:
UPDATE tokens
FROM temp_table (this updates existing records)

INSERT INTO tokens
SELECT ...
FROM temp_table
WHERE NOT IN (SELECT ... FROM tokens)

This way you don't do an update to newly inserted tokens, which helps
keep vacuuming needs in check.


The subselect might be quite a big set, so avoiding a full table scan 
and materialization by


DELETE temp_table
  WHERE key IN (select key FROM tokens JOIN temp_table);
INSERT INTO TOKENS SELECT * FROM temp_table;

or

INSERT INTO TOKENS
SELECT temp_table.* FROM temp_table LEFT JOIN tokens USING (key)
WHERE tokens.key IS NULL

might be an additional win, assuming that only a small fraction of 
tokens is inserted and updated.


Regards,
Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0

2005-07-31 Thread Tom Lane
Michael Parker [EMAIL PROTECTED] writes:
 sub bytea_esc {
   my ($str) = @_;
   my $buf = ;
   foreach my $char (split(//,$str)) {
 if (ord($char) == 0) { $buf .= 000; }
 elsif (ord($char) == 39) { $buf .= 047; }
 elsif (ord($char) == 92) { $buf .= 134; }
 else { $buf .= $char; }
   }
   return $buf;
 }

Oh, I see the problem: you forgot to convert  to a backslash sequence.

It would probably also be wise to convert anything = 128 to a backslash
sequence, so as to avoid any possible problems with multibyte character
encodings.  You wouldn't see this issue in a SQL_ASCII database, but I
suspect it would rise up to bite you with other encoding settings.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings