Re: [PERFORM] Performance problems on 4/8way Opteron (dualcore) HP
Anybody knows if RedHat is already supporting this patch on an enterprise version? Regards, Dirk J. Andrew Rogers wrote: On 7/29/05 10:46 AM, Josh Berkus josh@agliodbs.com wrote: does anybody have expierence with this machine (4x 875 dual core Opteron CPUs)? Nope. I suspect that you may be the first person to report in on dual-cores. There may be special compile issues with dual-cores that we've not yet encountered. There was recently a discussion of similar types of problems on a couple of the supercomputing lists, regarding surprisingly substandard performance from large dual-core opteron installations. The problem as I remember it boiled down to the Linux kernel handling memory/process management very badly on large dual core systems -- pathological NUMA behavior. However, this problem has apparently been fixed in Linux v2.6.12+, and using the more recent kernel on large dual core systems generated *massive* performance improvements on these systems for the individuals with this issue. Using the patched kernel, one gets the performance most people were expecting. The v2.6.12+ kernels are a bit new, but they contain a very important performance patch for systems like the one above. It would definitely be worth testing if possible. J. Andrew Rogers ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance problems on 4/8way Opteron (dualcore) HP
Hi Jeff, which box are you running precisely and which OS/kernel? We need to run 32bit because we need failover to 32 bit XEON system (DL580). If this does not work out we probably need to switch to 64 bit (dump/restore) and run a nother 64bit failover box too. Regards, Dirk Jeffrey W. Baker wrote: On Fri, 2005-07-29 at 10:46 -0700, Josh Berkus wrote: Dirk, does anybody have expierence with this machine (4x 875 dual core Opteron CPUs)? I'm using dual 275s without problems. Nope. I suspect that you may be the first person to report in on dual-cores. There may be special compile issues with dual-cores that we've not yet encountered. Doubtful. However you could see improvements using recent Linux kernel code. There have been some patches for optimizing scheduling and memory allocations. However, if you are running this machine in 32-bit mode, why did you bother paying $14,000 for your CPUs? You will get FAR better performance in 64-bit mode. 64-bit mode will give you 30-50% better performance on PostgreSQL loads, in my experience. Also, if I remember correctly, the 32-bit x86 kernel doesn't understand Opteron NUMA topology, so you may be seeing poor memory allocation decisions. -jwb We run RHEL 3.0, 32bit and under high load it is a drag. We mostly run memory demanding queries. Context switches are pretty much around 20.000 on the average, no cs spikes when we run many processes in parallel. Actually we only see two processes in running state! When there are only a few processes running context switches go much higher. At the moment we are much slower that with a 4way XEON box (DL580). Um, that was a bit incoherent. Are you seeing a CS storm or aren't you? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance problems on 4/8way Opteron (dualcore)
On 7/30/05 12:57 AM, William Yu [EMAIL PROTECTED] wrote: I haven't investigated the 2.6.12+ kernel updates yet -- I probably will do our development servers first to give it a test. The kernel updates make the NUMA code dual-core aware, which apparently makes a big difference in some cases but not in others. It makes some sense, since multi-processor multi-core machines will have two different types of non-locality instead of just one that need to be managed. Prior to the v2.6.12 patches, a dual-core dual-proc machine was viewed as a quad-proc machine. The closest thing to a supported v2.6.12 kernel that I know of is FC4, which is not really supported in the enterprise sense of course. J. Andrew Rogers ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0
John Arbash Meinel wrote: Matthew Schumacher wrote: All it's doing is trying the update before the insert to get around the problem of not knowing which is needed. With only 2-3 of the queries implemented I'm already back to running about the same speed as the original SA proc that is going to ship with SA 3.1.0. All of the queries are using indexes so at this point I'm pretty convinced that the biggest problem is the sheer number of queries required to run this proc 200 times for each email (once for each token). I don't see anything that could be done to make this much faster on the postgres end, it's looking like the solution is going to involve cutting down the number of queries some how. One thing that is still very puzzling to me is why this runs so much slower when I put the data.sql in a transaction. Obviously transactions are acting different when you call a proc a zillion times vs an insert query. Well, I played with adding a COMMIT;BEGIN; statement to your exact test every 1000 lines. And this is what I got: Just for reference, I also tested this on my old server, which is a dual Celeron 450 with 256M ram. FC4 and Postgres 8.0.3 Unmodified: real54m15.557s user0m24.328s sys 0m14.200s With Transactions every 1000 selects, and vacuum every 5000: real8m36.528s user0m16.585s sys 0m12.569s With Transactions every 1000 selects, and vacuum every 1: real7m50.748s user0m16.183s sys 0m12.489s On this machine vacuum is more expensive, since it doesn't have as much ram. Anyway, on this machine, I see approx 7x improvement. Which I think is probably going to satisfy your spamassassin needs. John =:- PS Looking forward to having a spamassassin that can utilize my favorite db. Right now, I'm not using a db backend because it wasn't worth setting up mysql. Unmodified: real17m53.587s user0m6.204s sys 0m3.556s With BEGIN/COMMIT: real1m53.466s user0m5.203s sys 0m3.211s So I see the potential for improvement almost 10 fold by switching to transactions. I played with the perl script (and re-implemented it in python), and for the same data as the perl script, using COPY instead of INSERT INTO means 5s instead of 33s. I also played around with adding VACUUM ANALYZE every 10 COMMITS, which brings the speed to: real1m41.258s user0m5.394s sys 0m3.212s And doing VACUUM ANALYZE every 5 COMMITS makes it: real1m46.403s user0m5.597s sys 0m3.244s I'm assuming the slowdown is because of the extra time spent vacuuming. Overall performance might still be improving, since you wouldn't actually be inserting all 100k rows at once. ... This is all run on Ubuntu, with postgres 7.4.7, and a completely unchanged postgresql.conf. (But the machine is a dual P4 2.4GHz, with 3GB of RAM). John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0
Ok, here is the current plan. Change the spamassassin API to pass a hash of tokens into the storage module, pass the tokens to the proc as an array, start a transaction, load the tokens into a temp table using copy, select the tokens distinct into the token table for new tokens, update the token table for known tokens, then commit. This solves the following problems: 1. Each email is a transaction instead of each token. 2. The update statement is only called when we really need an update which avoids all of those searches. 3. The looping work is done inside the proc instead of perl calling a method a zillion times per email. I'm not sure how vacuuming will be done yet, if we vacuum once per email that may be too often, so I may do that every 5 mins in cron. schu ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0
On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote: Ok, here is the current plan. Change the spamassassin API to pass a hash of tokens into the storage module, pass the tokens to the proc as an array, start a transaction, load the tokens into a temp table using copy, select the tokens distinct into the token table for new tokens, update the token table for known tokens, then commit. You might consider: UPDATE tokens FROM temp_table (this updates existing records) INSERT INTO tokens SELECT ... FROM temp_table WHERE NOT IN (SELECT ... FROM tokens) This way you don't do an update to newly inserted tokens, which helps keep vacuuming needs in check. This solves the following problems: 1. Each email is a transaction instead of each token. 2. The update statement is only called when we really need an update which avoids all of those searches. 3. The looping work is done inside the proc instead of perl calling a method a zillion times per email. I'm not sure how vacuuming will be done yet, if we vacuum once per email that may be too often, so I may do that every 5 mins in cron. I would suggest leaving an option to have SA vacuum every n emails, since some people may not want to mess with cron, etc. I suspect that pg_autovacuum would be able to keep up with things pretty well, though. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0
Jim C. Nasby wrote: On Sun, Jul 31, 2005 at 08:51:06AM -0800, Matthew Schumacher wrote: Ok, here is the current plan. Change the spamassassin API to pass a hash of tokens into the storage module, pass the tokens to the proc as an array, start a transaction, load the tokens into a temp table using copy, select the tokens distinct into the token table for new tokens, update the token table for known tokens, then commit. You might consider: UPDATE tokens FROM temp_table (this updates existing records) INSERT INTO tokens SELECT ... FROM temp_table WHERE NOT IN (SELECT ... FROM tokens) This way you don't do an update to newly inserted tokens, which helps keep vacuuming needs in check. The subselect might be quite a big set, so avoiding a full table scan and materialization by DELETE temp_table WHERE key IN (select key FROM tokens JOIN temp_table); INSERT INTO TOKENS SELECT * FROM temp_table; or INSERT INTO TOKENS SELECT temp_table.* FROM temp_table LEFT JOIN tokens USING (key) WHERE tokens.key IS NULL might be an additional win, assuming that only a small fraction of tokens is inserted and updated. Regards, Andreas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0
Michael Parker [EMAIL PROTECTED] writes: sub bytea_esc { my ($str) = @_; my $buf = ; foreach my $char (split(//,$str)) { if (ord($char) == 0) { $buf .= 000; } elsif (ord($char) == 39) { $buf .= 047; } elsif (ord($char) == 92) { $buf .= 134; } else { $buf .= $char; } } return $buf; } Oh, I see the problem: you forgot to convert to a backslash sequence. It would probably also be wise to convert anything = 128 to a backslash sequence, so as to avoid any possible problems with multibyte character encodings. You wouldn't see this issue in a SQL_ASCII database, but I suspect it would rise up to bite you with other encoding settings. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings