Re: [PERFORM] New to PostgreSQL, performance considerations
On Dec 15, 2006, at 04:09 , Ron wrote: At 07:27 PM 12/14/2006, Alexander Staubo wrote: Sorry, I neglected to include the pertinent graph: http://purefiction.net/paste/pgbench2.pdf In fact, your graph suggests that using arch specific options in addition to -O3 actually =hurts= performance. The difference is very slight. I'm going to run without -funroll- loops and -pipe (which are not arch-related) to get better data. Alexander. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] New to PostgreSQL, performance considerations
On Fri, Dec 15, 2006 at 10:53:25AM +0100, Alexander Staubo wrote: The difference is very slight. I'm going to run without -funroll- loops and -pipe (which are not arch-related) to get better data. -pipe does not matter for the generated code; it only affects compiler speed. (It simply means that the compiler runs cpp | cc | as1 instead of cpp tmp; cc tmp tmp2; as1 tmp2.) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] EXPLAIN ANALYZE on 8.2
On Thu, 2006-12-14 at 19:05 -0500, Tom Lane wrote: Kelly Burkhart [EMAIL PROTECTED] writes: I hope this isn't a crummy mainboard but I can't seem to affect things by changing clock source (kernel 2.6.16 SLES10). I tried kernel command option clock=XXX where XXX in (cyclone,hpet,pmtmr,tsc,pit), no option was significantly better than the default. I believe that on machines where gettimeofday is really nice and fast, it doesn't require entry to the kernel at all; there's some hack that makes the clock readable from userspace. (Obviously a switch to kernel mode would set you back a lot of the cycles involved here.) So it's not so much the kernel that you need to teach as glibc. How you do that is beyond my expertise, but maybe that will help you google for the right thing ... Until we work out a better solution we can fix this in two ways: 1. EXPLAIN ANALYZE [ [ WITH | WITHOUT ] TIME STATISTICS ] ... 2. enable_analyze_timer = off | on (default) (USERSET) A performance drop of 4x-10x is simply unacceptable when trying to tune queries where the current untuned time is already too high. Tying down production servers for hours on end when we know for certain all they are doing is calling gettimeofday millions of times is not good. This quickly leads to the view from objective people that PostgreSQL doesn't have a great optimizer, whatever we say in its defence. I don't want to leave this alone, but I don't want to spend a month fixing it either. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
On Fri, Dec 15, 2006 at 10:28:08AM +, Simon Riggs wrote: Until we work out a better solution we can fix this in two ways: 1. EXPLAIN ANALYZE [ [ WITH | WITHOUT ] TIME STATISTICS ] ... 2. enable_analyze_timer = off | on (default) (USERSET) What exactly would this do? Only count actual rows or something? I wrote a patch that tried statistical sampling, but the figures were too far off for people's liking. A performance drop of 4x-10x is simply unacceptable when trying to tune queries where the current untuned time is already too high. Tying down production servers for hours on end when we know for certain all they are doing is calling gettimeofday millions of times is not good. This quickly leads to the view from objective people that PostgreSQL doesn't have a great optimizer, whatever we say in its defence. I don't want to leave this alone, but I don't want to spend a month fixing it either. I think the best option is setitimer(), but it's not POSIX so platform support is going to be patchy. BTW, doing gettimeofday() without kernel entry is not really possible. You could use the cycle counter but it has the problem that if you have multiple CPUs you need to calibrate the result. If the CPU goes to sleep, there's is no way for the userspace process to know. Only the kernel has all the relevent information about what time is to get a reasonable result. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
Hi list, Le vendredi 15 décembre 2006 11:50, Martijn van Oosterhout a écrit : BTW, doing gettimeofday() without kernel entry is not really possible. You could use the cycle counter but it has the problem that if you have multiple CPUs you need to calibrate the result. If the CPU goes to sleep, there's is no way for the userspace process to know. Only the kernel has all the relevent information about what time is to get a reasonable result. I remember having played with intel RDTSC (time stamp counter) for some timing measurement, but just read from several sources (including linux kernel hackers considering its usage for gettimeofday() implementation) that TSC is not an accurate method to have elapsed time information. May be some others method than gettimeofday() are available (Lamport Timestamps, as PGDG may have to consider having a distributed processing ready EA in some future), cheaper and accurate? After all, the discussion, as far as I understand it, is about having a accurate measure of duration of events, knowing when they occurred in the day does not seem to be the point. My 2¢, hoping this could be somehow helpfull, -- Dimitri Fontaine http://www.dalibo.com/ pgppVKlng4gwN.pgp Description: PGP signature
Re: [PERFORM] New to PostgreSQL, performance considerations
At 04:54 AM 12/15/2006, Alexander Staubo wrote: On Dec 15, 2006, at 04:09 , Ron wrote: At 07:27 PM 12/14/2006, Alexander Staubo wrote: Sorry, I neglected to include the pertinent graph: http://purefiction.net/paste/pgbench2.pdf In fact, your graph suggests that using arch specific options in addition to -O3 actually =hurts= performance. According to the tech staff, this is a Sun X4100 with a two-drive RAID 1 volume. No idea about the make of the hard drives. Alexander. http://www.sun.com/servers/entry/x4100/features.xml So we are dealing with a 1U 1-4S (which means 1-8C) AMD Kx box with up to 32GB of ECC RAM (DDR2 ?) and 2 Seagate 2.5 SAS HDs. http://www.seagate.com/cda/products/discsales/index/1,,,00.html?interface=SAS My bet is the X4100 contains one of the 3 models of Cheetah 15K.4's. A simple du, dkinfo, whatever, will tell you which. I'm looking more closely into exactly what the various gcc -O optimizations do on Kx's as well. 64b vs 32b gets x86 compatible code access to ~ 2x as many registers; and MMX or SSE instructions get you access to not only more registers, but wider ones as well. As one wit has noted, all optimization is an exercise in caching. (Terje Mathisen- one of the better assembler coders on the planet.) It seems unusual that code generation options which give access to more registers would ever result in slower code... Ron Peacetree ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
On Fri, 2006-12-15 at 11:50 +0100, Martijn van Oosterhout wrote: On Fri, Dec 15, 2006 at 10:28:08AM +, Simon Riggs wrote: Until we work out a better solution we can fix this in two ways: 1. EXPLAIN ANALYZE [ [ WITH | WITHOUT ] TIME STATISTICS ] ... 2. enable_analyze_timer = off | on (default) (USERSET) What exactly would this do? Only count actual rows or something? Yes. It's better to have this than nothing at all. I wrote a patch that tried statistical sampling, but the figures were too far off for people's liking. Well, I like your ideas, so if you have any more... Maybe sampling every 10 rows will bring things down to an acceptable level (after the first N). You tried less than 10 didn't you? Maybe we can count how many real I/Os were required to perform each particular row, so we can adjust the time per row based upon I/Os. ISTM that sampling at too low a rate means we can't spot the effects of cache and I/O which can often be low frequency but high impact. I think the best option is setitimer(), but it's not POSIX so platform support is going to be patchy. Don't understand that. I thought that was to do with alarms and signals. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
On Fri, Dec 15, 2006 at 12:15:59PM +, Gregory Stark wrote: There are various attempts at providing better timing infrastructure at low overhead but I'm not sure what's out there currently. I expect to do this what we'll have to do is invent a pg_* abstraction that has various implementations on different architectures. On Solaris it can use DTrace internally, on Linux it might have something else (or more likely several different options depending on the age and config options of the kernel). I think we need to move to a sampling approach. setitimer is good, except it doesn't tell you if signals have been lost. Given they are most likely to be lost during high disk I/O, they're actually significant. I'm trying to think of a way around that. Then you don't need a cheap gettimeofday at all... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
On Fri, Dec 15, 2006 at 12:20:46PM +, Simon Riggs wrote: I wrote a patch that tried statistical sampling, but the figures were too far off for people's liking. Well, I like your ideas, so if you have any more... Maybe sampling every 10 rows will bring things down to an acceptable level (after the first N). You tried less than 10 didn't you? Yeah, it reduced the number of calls as the count got larger. It broke somewhere, though I don't quite remember why. Maybe we can count how many real I/Os were required to perform each particular row, so we can adjust the time per row based upon I/Os. ISTM that sampling at too low a rate means we can't spot the effects of cache and I/O which can often be low frequency but high impact. One idea is to sample at fixed interval. Where the I/O cost is high, there'll be a lot of sampling points. The issue being that the final result are not totally accurate anymore. I think the best option is setitimer(), but it's not POSIX so platform support is going to be patchy. Don't understand that. I thought that was to do with alarms and signals. On my system the manpage say setitimer() conforms to: SVr4, 4.4BSD. I'm unsure how many of the supported platforms fall under that catagorisation. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
* Simon Riggs: I think the best option is setitimer(), but it's not POSIX so platform support is going to be patchy. Don't understand that. I thought that was to do with alarms and signals. You could use it for sampling. Every few milliseconds, you record which code is executing (possibly using a global variable which is set and reset accordingly). But this might lead to tons of interrupted system calls, and not all systems mask them, so this might not be an option for the PostgreSQL code base. On the other hand, if performance in more recent Linux releases (from kernel.org) are acceptable, you should assume that the problem will eventually fix itself. FWIW, I see the 9x overhead on something that is close to 2.6.17 (on AMD64/Opteron), so this could be wishful thinking. 8-( -- Florian Weimer[EMAIL PROTECTED] BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] New to PostgreSQL, performance considerations
On 12/15/06, Ron [EMAIL PROTECTED] wrote: I'm looking more closely into exactly what the various gcc -O optimizations do on Kx's as well. 64b vs 32b gets x86 compatible code access to ~ 2x as many registers; and MMX or SSE instructions get you access to not only more registers, but wider ones as well. As one wit has noted, all optimization is an exercise in caching. (Terje Mathisen- one of the better assembler coders on the planet.) It seems unusual that code generation options which give access to more registers would ever result in slower code... The slower is probably due to the unroll loops switch which can actually hurt code due to the larger footprint (less cache coherency). The extra registers are not all that important because of pipelining and other hardware tricks. Pretty much all the old assembly strategies such as forcing local variables to registers are basically obsolete...especially with regards to integer math. As I said before, modern CPUs are essentially RISC engines with a CISC preprocessing engine laid in top. Things are much more complicated than they were in the old days where you could count instructions for the assembly optimization process. I suspect that there is little or no differnece between the -march=686 and the various specifc archicectures. Did anybody think to look at the binaries and look for the amount of differences? I bet you code compiled for march=opteron will just fine on a pentium 2 if compiled for 32 bit. merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] New to PostgreSQL, performance considerations
On Fri, 15 Dec 2006, Merlin Moncure wrote: The slower is probably due to the unroll loops switch which can actually hurt code due to the larger footprint (less cache coherency). The cache issues are so important with current processors that I'd suggest throwing -Os (optimize for size) into the mix people test. That one may stack usefully with -O2, but probably not with -O3 (3 includes optimizations that increase code size). -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
Martijn van Oosterhout kleptog@svana.org writes: On Fri, Dec 15, 2006 at 12:20:46PM +, Simon Riggs wrote: Maybe sampling every 10 rows will bring things down to an acceptable level (after the first N). You tried less than 10 didn't you? Yeah, it reduced the number of calls as the count got larger. It broke somewhere, though I don't quite remember why. The fundamental problem with it was the assumption that different executions of a plan node will have the same timing. That's not true, in fact not even approximately true. IIRC the patch did realize that first-time-through is not a predictor for the rest, but some of our plan nodes have enormous variance even after the first time. I think the worst case is batched hash joins. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] New to PostgreSQL, performance considerations
At 09:23 AM 12/15/2006, Merlin Moncure wrote: On 12/15/06, Ron [EMAIL PROTECTED] wrote: It seems unusual that code generation options which give access to more registers would ever result in slower code... The slower is probably due to the unroll loops switch which can actually hurt code due to the larger footprint (less cache coherency). I have seen that effect as well occasionally in the last few decades ;-) OTOH, suspicion is not _proof_; and I've seen other optimizations turn out to be pessimizations over the years as well. The extra registers are not all that important because of pipelining and other hardware tricks. No. Whoever told you this or gave you such an impression was mistaken. There are many instances of x86 compatible code that get 30-40% speedups just because they get access to 16 rather than 8 GPRs when recompiled for x84-64. Pretty much all the old assembly strategies such as forcing local variables to registers are basically obsolete...especially with regards to integer math. Again, not true. OTOH, humans are unlikely at this point to be able to duplicate the accuracy of the compiler's register coloring algorithms. Especially on x86 compatibles. (The flip side is that _expert_ humans can often put the quirky register set and instruction pipelines of x86 compatibles to more effective use for a specific chunk of code than even the best compilers can.) As I said before, modern CPUs are essentially RISC engines with a CISC preprocessing engine laid in top. I'm sure you meant modern =x86 compatible= CPUs are essentially RISC engines with a CISC engine on top. Just as all the world's not a VAX, all CPUs are not x86 compatibles. Forgetting this has occasionally cost folks I know... Things are much more complicated than they were in the old days where you could count instructions for the assembly optimization process. Those were the =very= old days in computer time... I suspect that there is little or no differnece between the -march=686 and the various specifc archicectures. There should be. The FSF compiler folks (and the rest of the industry compiler folks for that matter) are far from stupid. They are not just adding compiler switches because they randomly feel like it. Evidence suggests that the most recent CPUs are in need of =more= arch specific TLC compared to their ancestors, and that this trend is not only going to continue, it is going to accelerate. Did anybody think to look at the binaries and look for the amount of differences? I bet you code compiled for march=opteron will just fine on a pentium 2 if compiled for 32 bit. Sucker bet given that the whole point of a 32b x86 compatible is to be able to run code on any I32 ISA. CPU. OTOH, I bet that code optimized for best performance on a P2 is not getting best performance on a P4. Or vice versa. ;-) The big arch specific differences in Kx's are in 64b mode. Not 32b Ron Peacetree. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
Gregory Stark [EMAIL PROTECTED] writes: There are various attempts at providing better timing infrastructure at low overhead but I'm not sure what's out there currently. I expect to do this what we'll have to do is invent a pg_* abstraction that has various implementations on different architectures. You've got to be kidding. Surely it's glibc's responsibility, not ours, to implement gettimeofday correctly for the hardware. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
On Fri, Dec 15, 2006 at 09:56:57AM -0500, Tom Lane wrote: Martijn van Oosterhout kleptog@svana.org writes: On Fri, Dec 15, 2006 at 12:20:46PM +, Simon Riggs wrote: Maybe sampling every 10 rows will bring things down to an acceptable level (after the first N). You tried less than 10 didn't you? Yeah, it reduced the number of calls as the count got larger. It broke somewhere, though I don't quite remember why. The fundamental problem with it was the assumption that different executions of a plan node will have the same timing. That's not true, in fact not even approximately true. IIRC the patch did realize that first-time-through is not a predictor for the rest, but some of our plan nodes have enormous variance even after the first time. I think the worst case is batched hash joins. It didn't assume that because that's obviously bogus. It assumed the durations would be spread as a normal distribution. Which meant that over time the average of the measured iterations would approch the actual average. It tried to take enough measurements to try and keep expected error small, but it's statistics, you can only say this will give the right answer 95% of the time. You are correct though, the error was caused by unexpectedly large variations, or more likely, an unexpected distribution curve. Statistically, we took enough samples to not be affected significantly by large variations. Even if it looked more like a gamma distribution it should not have been as far off as it was. Looking at alternative approaches, like sampling with a timer, you end up with the same problem: sometimes the calculations will fail and produce something strange. The simplest example being than a 100Hz timer is not going to produce any useful information for queries in the millisecond range. A higher frequency timer than that is not going to be available portably. You could probably throw more effort into refining the statistics behind it, but at some point we're going to have to draw a line and say: it's going to be wrong X% of the time, deal with it. If we're not willing to say that, then there's no point going ahead with any statistical approach. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [PERFORM] New to PostgreSQL, performance considerations
On 12/15/06, Ron [EMAIL PROTECTED] wrote: At 09:23 AM 12/15/2006, Merlin Moncure wrote: On 12/15/06, Ron [EMAIL PROTECTED] wrote: It seems unusual that code generation options which give access to more registers would ever result in slower code... The slower is probably due to the unroll loops switch which can actually hurt code due to the larger footprint (less cache coherency). I have seen that effect as well occasionally in the last few decades ;-) OTOH, suspicion is not _proof_; and I've seen other optimizations turn out to be pessimizations over the years as well. The extra registers are not all that important because of pipelining and other hardware tricks. No. Whoever told you this or gave you such an impression was mistaken. There are many instances of x86 compatible code that get 30-40% speedups just because they get access to 16 rather than 8 GPRs when recompiled for x84-64. I'm not debating that this is true in specific cases. Encryption and video en/decoding have shown to be faster in 64 bit mode on the same achicture (a cursor search in google will confirm this). However, 32-64 bit is not the same argument since there are a lot of other variables besides more registers. 64 bit mode is often slower on many programs because the extra code size from 64 bit pointers. We benchmarked PostgreSQL internally here and found it to be fastest in 32 bit mode running on a 64 bit platform -- this was on a quad opteron 870 runnning our specific software stack, your results might be differnt of course. Pretty much all the old assembly strategies such as forcing local variables to registers are basically obsolete...especially with regards to integer math. Again, not true. OTOH, humans are unlikely at this point to be able to duplicate the accuracy of the compiler's register coloring algorithms. Especially on x86 compatibles. (The flip side is that _expert_ humans can often put the quirky register set and instruction pipelines of x86 compatibles to more effective use for a specific chunk of code than even the best compilers can.) As I said before, modern CPUs are essentially RISC engines with a CISC preprocessing engine laid in top. I'm sure you meant modern =x86 compatible= CPUs are essentially RISC engines with a CISC engine on top. Just as all the world's not a VAX, all CPUs are not x86 compatibles. Forgetting this has occasionally cost folks I know... yes, In fact made this point earler. Things are much more complicated than they were in the old days where you could count instructions for the assembly optimization process. Those were the =very= old days in computer time... I suspect that there is little or no differnece between the -march=686 and the various specifc archicectures. There should be. The FSF compiler folks (and the rest of the industry compiler folks for that matter) are far from stupid. They are not just adding compiler switches because they randomly feel like it. Evidence suggests that the most recent CPUs are in need of =more= arch specific TLC compared to their ancestors, and that this trend is not only going to continue, it is going to accelerate. Did anybody think to look at the binaries and look for the amount of differences? I bet you code compiled for march=opteron will just fine on a pentium 2 if compiled for 32 bit. Sucker bet given that the whole point of a 32b x86 compatible is to be able to run code on any I32 ISA. CPU. OTOH, I bet that code optimized for best performance on a P2 is not getting best performance on a P4. Or vice versa. ;-) The big arch specific differences in Kx's are in 64b mode. Not 32b I dont think so. IMO all the processor specific instruction sets were hacks of 32 bit mode to optimize specific tasks. Except for certain things these instructions are rarely, if ever used in 64 bit mode, especially in integer math (i.e. database binaries). Since Intel and AMD64 64 bit are virtually indentical I submit that -march is not really important anymore except for very, very specific (but important) cases like spinlocks. This thread is about how much architecture depenant binares can beat standard ones. I say they don't very much at all, and with the specific exception of Daniel's benchmarking the results posted to this list bear that out. merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
Peter Eisentraut [EMAIL PROTECTED] writes: Am Freitag, 15. Dezember 2006 11:28 schrieb Simon Riggs: Until we work out a better solution we can fix this in two ways: 1. EXPLAIN ANALYZE [ [ WITH | WITHOUT ] TIME STATISTICS ] ... 2. enable_analyze_timer = off | on (default) (USERSET) The second one is enough in my mind. I don't see any point in either one. If you're not going to collect timing data then the only useful info EXPLAIN ANALYZE could provide is knowledge of which rowcount estimates are badly off ... and to get that, you have to wait for the query to finish, which may well be impractical even without the gettimeofday overhead. We had discussed upthread the idea of having an option to issue a NOTICE as soon as any actual rowcount exceeds the estimate by some-configurable-percentage, and that seems to me to be a much more useful response to the problem of E.A. takes too long than removing gettimeofday. One thing that's not too clear to me though is how the NOTICE would identify the node at which the rowcount was exceeded... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: There are various attempts at providing better timing infrastructure at low overhead but I'm not sure what's out there currently. I expect to do this what we'll have to do is invent a pg_* abstraction that has various implementations on different architectures. You've got to be kidding. Surely it's glibc's responsibility, not ours, to implement gettimeofday correctly for the hardware. Except for two things: a) We don't really need gettimeofday. That means we don't need something sensitive to adjustments made by ntp etc. In fact that would be actively bad. Currently if the user runs date to reset his clock back a few days I bet interesting things happen to a large explain analyze that's running. In fact we don't need something that represents any absolute time, only time elapsed since some other point we choose. That might be easier to implement than what glibc has to do to implement gettimeofday fully. b) glibc may not want to endure an overhead on every syscall and context switch to make gettimeofday faster on the assumption that gettimeofday is a rare call and it should pay the price rather than imposing an overhead on everything else. Postgres knows when it's running an explain analyze and a 1% overhead would be entirely tolerable, especially if it affected the process pretty much evenly unlike the per-gettimeofday-overhead which can get up as high as 100% on some types of subplans and is negligible on others. And more to the point Postgres wouldn't have to endure this overhead at all when it's not needed whereas glibc has no idea when you're going to need gettimeofday next. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Insertion to temp table deteriorating over time
Here's the output of VACUUM VERBOSE pg_class. I think it looks fine. I even did it three times in a row, each about 10 minutes apart, just to see what was changing: INFO: vacuuming pg_catalog.pg_class INFO: index pg_class_oid_index now contains 3263 row versions in 175 pages DETAIL: 5680 index row versions were removed. 150 index pages have been deleted, 136 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index pg_class_relname_nsp_index now contains 3263 row versions in 1301 pages DETAIL: 5680 index row versions were removed. 822 index pages have been deleted, 734 are currently reusable. CPU 0.01s/0.01u sec elapsed 0.03 sec. INFO: pg_class: removed 5680 row versions in 109 pages DETAIL: CPU 0.00s/0.01u sec elapsed 0.04 sec. INFO: pg_class: found 5680 removable, 3263 nonremovable row versions in 625 p ages DETAIL: 0 dead row versions cannot be removed yet. There were 23925 unused item pointers. 0 pages are entirely empty. CPU 0.02s/0.04u sec elapsed 0.10 sec. VACUUM INFO: vacuuming pg_catalog.pg_class INFO: index pg_class_oid_index now contains 3263 row versions in 175 pages DETAIL: 24 index row versions were removed. 150 index pages have been deleted, 150 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index pg_class_relname_nsp_index now contains 3263 row versions in 1301 pages DETAIL: 24 index row versions were removed. 822 index pages have been deleted, 822 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: pg_class: removed 24 row versions in 2 pages DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: pg_class: found 24 removable, 3263 nonremovable row versions in 625 pag es DETAIL: 0 dead row versions cannot be removed yet. There were 29581 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM INFO: vacuuming pg_catalog.pg_class INFO: index pg_class_oid_index now contains 3263 row versions in 175 pages DETAIL: 150 index pages have been deleted, 150 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index pg_class_relname_nsp_index now contains 3263 row versions in 1301 pages DETAIL: 822 index pages have been deleted, 822 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: pg_class: found 0 removable, 3263 nonremovable row versions in 625 page s DETAIL: 0 dead row versions cannot be removed yet. There were 29605 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM The one thing that seems to be steadily increasing is the number of unused item pointers. Not sure if that's normal. I should also point out that SELECT statements are not experiencing the same degradation as the INSERTs to the temp table. SELECTs are performing just as well now (24 hours since restarting the connection) as they did immediately after restarting the connection. INSERTs to the temp table are 5 times slower now than they were 24 hours ago. I wonder if the problem has to do with a long running ODBC connection. Steve On 12/14/06, Tom Lane [EMAIL PROTECTED] wrote: Steven Flatt [EMAIL PROTECTED] writes: Regarding your other email -- interesting -- but we are vacuuming pg_class every hour. So I don't think the answer lies there... That's good, but is the vacuum actually accomplishing anything? I'm wondering if there's also a long-running transaction in the mix. Try a manual VACUUM VERBOSE pg_class; after the thing has slowed down, and see what it says about removable and nonremovable rows. regards, tom lane
Re: [PERFORM] File Systems Compared
The reply wasn't (directly copied to the performance list, but I will copy this one back. On Thu, Dec 14, 2006 at 13:21:11 -0800, Ron Mayer [EMAIL PROTECTED] wrote: Bruno Wolff III wrote: On Thu, Dec 14, 2006 at 01:39:00 -0500, Jim Nasby [EMAIL PROTECTED] wrote: On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote: This appears to be changing under Linux. Recent kernels have write barriers implemented using cache flush commands (which some drives ignore, so you need to be careful). Is it true that some drives ignore this; or is it mostly an urban legend that was started by testers that didn't have kernels with write barrier support. I'd be especially interested in knowing if there are any currently available drives which ignore those commands. In very recent kernels, software raid using raid 1 will also handle write barriers. To get this feature, you are supposed to mount ext3 file systems with the barrier=1 option. For other file systems, the parameter may need to be different. With XFS the default is apparently to enable write barrier support unless you explicitly disable it with the nobarrier mount option. It also will warn you in the system log if the underlying device doesn't have write barrier support. SGI recommends that you use the nobarrier mount option if you do have a persistent (battery backed) write cache on your raid device. http://oss.sgi.com/projects/xfs/faq.html#wcache But would that actually provide a meaningful benefit? When you COMMIT, the WAL data must hit non-volatile storage of some kind, which without a BBU or something similar, means hitting the platter. So I don't see how enabling the disk cache will help, unless of course it's ignoring fsync. With write barriers, fsync() waits for the physical disk; but I believe the background writes from write() done by pdflush don't have to; so it's kinda like only disabling the cache for WAL files and the filesystem's journal, but having it enabled for the rest of your write activity (the tables except at checkpoints? the log file?). Note the use case for this is more for hobbiests or development boxes. You can only use it on software raid (md) 1, which rules out most real systems. Ugh. Looking for where that's documented; and hoping it is or will soon work on software 1+0 as well. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] File Systems Compared
On Thu, Dec 14, 2006 at 13:21:11 -0800, Ron Mayer [EMAIL PROTECTED] wrote: Bruno Wolff III wrote: On Thu, Dec 14, 2006 at 01:39:00 -0500, Jim Nasby [EMAIL PROTECTED] wrote: On Dec 11, 2006, at 12:54 PM, Bruno Wolff III wrote: This appears to be changing under Linux. Recent kernels have write barriers implemented using cache flush commands (which some drives ignore, so you need to be careful). Is it true that some drives ignore this; or is it mostly an urban legend that was started by testers that didn't have kernels with write barrier support. I'd be especially interested in knowing if there are any currently available drives which ignore those commands. I saw posts claiming this, but no specific drives mentioned. I did see one post that claimed that the cache flush command was mandated (not optional) by the spec. In very recent kernels, software raid using raid 1 will also handle write barriers. To get this feature, you are supposed to mount ext3 file systems with the barrier=1 option. For other file systems, the parameter may need to be different. With XFS the default is apparently to enable write barrier support unless you explicitly disable it with the nobarrier mount option. It also will warn you in the system log if the underlying device doesn't have write barrier support. I think there might be a similar patch for ext3 going into 2.6.19. I haven't checked a 2.6.19 kernel to make sure though. SGI recommends that you use the nobarrier mount option if you do have a persistent (battery backed) write cache on your raid device. http://oss.sgi.com/projects/xfs/faq.html#wcache But would that actually provide a meaningful benefit? When you COMMIT, the WAL data must hit non-volatile storage of some kind, which without a BBU or something similar, means hitting the platter. So I don't see how enabling the disk cache will help, unless of course it's ignoring fsync. With write barriers, fsync() waits for the physical disk; but I believe the background writes from write() done by pdflush don't have to; so it's kinda like only disabling the cache for WAL files and the filesystem's journal, but having it enabled for the rest of your write activity (the tables except at checkpoints? the log file?). Not exactly. Whenever you commit the file system log or fsync the wal file, all previously written blocks will be flushed to the disk platter, before any new write requests are honored. So journalling semantics will work properly. Note the use case for this is more for hobbiests or development boxes. You can only use it on software raid (md) 1, which rules out most real systems. Ugh. Looking for where that's documented; and hoping it is or will soon work on software 1+0 as well. I saw a comment somewhere that raid 0 provided some problems and the suggestion was to handle the barrier at a different level (though I don't know how you could). So I don't belive 1+0 or 5 are currently supported or will be in the near term. The other feature I would like is to be able to use write barriers with encrypted file systems. I haven't found anythign on whether or not there are near term plans by any one to support that. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] New to PostgreSQL, performance considerations
At 09:50 AM 12/15/2006, Greg Smith wrote: On Fri, 15 Dec 2006, Merlin Moncure wrote: The slower is probably due to the unroll loops switch which can actually hurt code due to the larger footprint (less cache coherency). The cache issues are so important with current processors that I'd suggest throwing -Os (optimize for size) into the mix people test. That one may stack usefully with -O2, but probably not with -O3 (3 includes optimizations that increase code size). -Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. -Os disables the following optimization flags: -falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version Hmmm. That list of disabled flags bears thought. -falign-functions -falign-jumps -falign-loops -falign-labels 1= Most RISC CPUs performance is very sensitive to misalignment issues. Not recommended to turn these off. -freorder-blocks Reorder basic blocks in the compiled function in order to reduce number of taken branches and improve code locality. Enabled at levels -O2, -O3. -freorder-blocks-and-partition In addition to reordering basic blocks in the compiled function, in order to reduce number of taken branches, partitions hot and cold basic blocks into separate sections of the assembly and .o files, to improve paging and cache locality performance. This optimization is automatically turned off in the presence of exception handling, for link once sections, for functions with a user-defined section attribute and on any architecture that does not support named sections. 2= Most RISC CPUs are cranky about branchy code and (lack of) cache locality. Wouldn't suggest punting these either. -fprefetch-loop-arrays If supported by the target machine, generate instructions to prefetch memory to improve the performance of loops that access large arrays. This option may generate better or worse code; results are highly dependent on the structure of loops within the source code. 3= OTOH, This one looks worth experimenting with turning off. -ftree-vect-loop-version Perform loop versioning when doing loop vectorization on trees. When a loop appears to be vectorizable except that data alignment or data dependence cannot be determined at compile time then vectorized and non-vectorized versions of the loop are generated along with runtime checks for alignment or dependence to control which version is executed. This option is enabled by default except at level -Os where it is disabled. 4= ...and this one looks like a 50/50 shot. Ron Peacetree ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] EXPLAIN ANALYZE on 8.2
At 10:45 AM 12/15/2006, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: There are various attempts at providing better timing infrastructure at low overhead but I'm not sure what's out there currently. I expect to do this what we'll have to do is invent a pg_* abstraction that has various implementations on different architectures. You've got to be kidding. Surely it's glibc's responsibility, not ours, to implement gettimeofday correctly for the hardware. regards, tom lane I agree with Tom on this. Perhaps the best compromise is for the pg community to make thoughtful suggestions to the glibc community? Ron Peacetree ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Insertion to temp table deteriorating over time
Steven Flatt [EMAIL PROTECTED] writes: Here's the output of VACUUM VERBOSE pg_class. I think it looks fine. I even did it three times in a row, each about 10 minutes apart, just to see what was changing: Hm, look at the numbers of rows removed: INFO: pg_class: found 5680 removable, 3263 nonremovable row versions in 625 pages DETAIL: 0 dead row versions cannot be removed yet. INFO: pg_class: found 24 removable, 3263 nonremovable row versions in 625 pages DETAIL: 0 dead row versions cannot be removed yet. INFO: pg_class: found 0 removable, 3263 nonremovable row versions in 625 pages DETAIL: 0 dead row versions cannot be removed yet. The lack of unremovable dead rows is good, but why were there so many dead rows the first time? You didn't say what the cycle time is on your truncate-and-refill process, but the last two suggest that the average rate of accumulation of dead pg_class rows is only a couple per minute, in which case it's been a lot longer than an hour since the previous VACUUM of pg_class. I'm back to suspecting that you don't vacuum pg_class regularly. You mentioned having an hourly cron job to fire off vacuums ... are you sure it's run as a database superuser? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] New to PostgreSQL, performance considerations
At 10:55 AM 12/15/2006, Merlin Moncure wrote: On 12/15/06, Ron [EMAIL PROTECTED] wrote: There are many instances of x86 compatible code that get 30-40% speedups just because they get access to 16 rather than 8 GPRs when recompiled for x84-64. ...We benchmarked PostgreSQL internally here and found it to be fastest in 32 bit mode running on a 64 bit platform -- this was on a quad opteron 870 runnning our specific software stack, your results might be differnt of course. On AMD Kx's, you probably will get best performance in 64b mode (so you get all those extra registers and other stuff) while using 32b pointers (to keep code size and memory footprint down). On Intel C2's, things are more complicated since Intel's x86-64 implementation and memory IO architecture are both different enough from AMD's to have caused some consternation on occasion when Intel's 64b performance did not match AMD's. The big arch specific differences in Kx's are in 64b mode. Not 32b I dont think so. IMO all the processor specific instruction sets were hacks of 32 bit mode to optimize specific tasks. Except for certain things these instructions are rarely, if ever used in 64 bit mode, especially in integer math (i.e. database binaries). Since Intel and AMD64 64 bit are virtually indentical I submit that -march is not really important anymore except for very, very specific (but important) cases like spinlocks. Take a good look at the processor specific manuals and the x86-64 benches around the net. The evidence outside the DBMS domain is pretty solidly in contrast to your statement and position. Admittedly, DBMS are not web servers or FPS games or ... That's why we need to do our own rigorous study of the subject. This thread is about how much architecture depenant binares can beat standard ones. I say they don't very much at all, and with the specific exception of Daniel's benchmarking the results posted to this list bear that out. ...and IMHO the issue is still very much undecided given that we don't have enough properly obtained and documented evidence. ATM, the most we can say is that in a number of systems with modest physical IO subsystems that are not running Gentoo Linux we have not been able to duplicate the results. (We've also gotten some interesting results like yours suggesting the arch specific optimizations are bad for pg performance in your environment.) In the process questions have been asked and issues raised regarding both the tolls involved and the proper way to use them. We really do need to have someone other than Daniel duplicate his Gentoo environment and independently try to duplicate his results. ...and let us bear in mind that this is not just intellectual curiosity. The less pg is mysterious, the better the odds pg will be adopted in any specific case. Ron Peacetree ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Insertion to temp table deteriorating over time
Our application is such that there is a great deal of activity at the beginning of the hour and minimal activity near the end of the hour. Those 3 vacuums were done at (approximately) 30 minutes past, 40 minutes past, and 50 minutes past the hour, during low activity. Vacuums of pg_class look like they're being done on the hour. So it's not surprising that the first vacuum found a lot of dead rows while the latter two found very few. In fact, I just did another vacuum (about 30 minutes past the hour again) and got: INFO: pg_class: found 5490 removable, 3263 nonremovable row versions in 171 pages DETAIL: 0 dead row versions cannot be removed yet. ... and clearly a vacuum was done under an hour ago. The truncate and re-fill process is done once per hour, at the end of the high-load cycle, so I doubt that's even a big contributor to the number of removable rows in pg_class. For this particular setup, we expect high load for 10-15 minutes at the beginning of the hour, which is the case when a new connection is initialized. After a day or so (as is happening right now), the high-load period spills into the second half of the hour. Within 3-4 days, we start spilling into the next hour and, as you can imagine, everything gets behind and we spiral down from there. For now, our workaround is to manually kill the connection every few days, but I would like a better solution than setting up a cron job to do this! Thanks again, Steve On 12/15/06, Tom Lane [EMAIL PROTECTED] wrote: Hm, look at the numbers of rows removed: INFO: pg_class: found 5680 removable, 3263 nonremovable row versions in 625 pages DETAIL: 0 dead row versions cannot be removed yet. INFO: pg_class: found 24 removable, 3263 nonremovable row versions in 625 pages DETAIL: 0 dead row versions cannot be removed yet. INFO: pg_class: found 0 removable, 3263 nonremovable row versions in 625 pages DETAIL: 0 dead row versions cannot be removed yet. The lack of unremovable dead rows is good, but why were there so many dead rows the first time? You didn't say what the cycle time is on your truncate-and-refill process, but the last two suggest that the average rate of accumulation of dead pg_class rows is only a couple per minute, in which case it's been a lot longer than an hour since the previous VACUUM of pg_class. I'm back to suspecting that you don't vacuum pg_class regularly. You mentioned having an hourly cron job to fire off vacuums ... are you sure it's run as a database superuser? regards, tom lane
Re: [PERFORM] Insertion to temp table deteriorating over time
Steven Flatt [EMAIL PROTECTED] writes: Our application is such that there is a great deal of activity at the beginning of the hour and minimal activity near the end of the hour. OK ... The truncate and re-fill process is done once per hour, at the end of the high-load cycle, so I doubt that's even a big contributor to the number of removable rows in pg_class. Oh, then where *are* the removable rows coming from? At this point I think that the truncate/refill thing is not the culprit, or at any rate is only one part of a problematic usage pattern that we don't see all of yet. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Insertion to temp table deteriorating over time
Good question, and I agree with your point. Are the removable rows in pg_class even an issue? So what if 5000-6000 dead tuples are generated every hour then vacuumed? Performance continues to steadily decline over a few days time. Memory usage does not appear to be bloating. Open file handles remain fairly fixed. Is there anything else I can monitor (perhaps something to do with the odbc connection) that I could potentially correlate with the degrading performance? Steve On 12/15/06, Tom Lane [EMAIL PROTECTED] wrote: Oh, then where *are* the removable rows coming from? At this point I think that the truncate/refill thing is not the culprit, or at any rate is only one part of a problematic usage pattern that we don't see all of yet. regards, tom lane
Re: [PERFORM] Insertion to temp table deteriorating over time
Steven Flatt [EMAIL PROTECTED] writes: Are the removable rows in pg_class even an issue? So what if 5000-6000 dead tuples are generated every hour then vacuumed? Performance continues to steadily decline over a few days time. Memory usage does not appear to be bloating. Open file handles remain fairly fixed. Is there anything else I can monitor (perhaps something to do with the odbc connection) that I could potentially correlate with the degrading performance? At this point I think the most productive thing for you to do is to try to set up a self-contained test case that reproduces the slowdown. That would allow you to poke at it without disturbing your production system, and would let other people look at it too. From what you've said, I'd try a simple little program that inserts some data into a temp table, truncates the table, and repeats, as fast as it can, using the same SQL commands as your real code and similar but dummy data. It shouldn't take long to observe the slowdown if it occurs. If you can't reproduce it in isolation then we'll know that some other part of your application environment is contributing to the problem; if you can, I'd be happy to look at the test case with gprof or oprofile and find out exactly what's going on. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Insertion to temp table deteriorating over time
I've been trying to reproduce the problem for days now :). I've done pretty much exactly what you describe below, but I can't reproduce the problem on any of our lab machines. Something is indeed special in this environment. Thanks for all your help, Steve On 12/15/06, Tom Lane [EMAIL PROTECTED] wrote: Steven Flatt [EMAIL PROTECTED] writes: Are the removable rows in pg_class even an issue? So what if 5000-6000 dead tuples are generated every hour then vacuumed? Performance continues to steadily decline over a few days time. Memory usage does not appear to be bloating. Open file handles remain fairly fixed. Is there anything else I can monitor (perhaps something to do with the odbc connection) that I could potentially correlate with the degrading performance? At this point I think the most productive thing for you to do is to try to set up a self-contained test case that reproduces the slowdown. That would allow you to poke at it without disturbing your production system, and would let other people look at it too. From what you've said, I'd try a simple little program that inserts some data into a temp table, truncates the table, and repeats, as fast as it can, using the same SQL commands as your real code and similar but dummy data. It shouldn't take long to observe the slowdown if it occurs. If you can't reproduce it in isolation then we'll know that some other part of your application environment is contributing to the problem; if you can, I'd be happy to look at the test case with gprof or oprofile and find out exactly what's going on. regards, tom lane
Re: [PERFORM] Insertion to temp table deteriorating over time
Steven Flatt [EMAIL PROTECTED] writes: I've been trying to reproduce the problem for days now :). I've done pretty much exactly what you describe below, but I can't reproduce the problem on any of our lab machines. Something is indeed special in this environment. Yuck. You could try strace'ing the problem backend and see if anything is visibly different between fast and slow operation. I don't suppose you have oprofile on that machine, but if you did it'd be even better. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] File Systems Compared
On Fri, Dec 15, 2006 at 10:34:15 -0600, Bruno Wolff III [EMAIL PROTECTED] wrote: The reply wasn't (directly copied to the performance list, but I will copy this one back. Sorry about this one, I meant to intersperse my replies and hit the 'y' key at the wrong time. (And there ended up being a copy on performance anyway from the news gateway.) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] New to PostgreSQL, performance considerations
On Dec 15, 2006, at 17:53 , Ron wrote: At 09:50 AM 12/15/2006, Greg Smith wrote: On Fri, 15 Dec 2006, Merlin Moncure wrote: The slower is probably due to the unroll loops switch which can actually hurt code due to the larger footprint (less cache coherency). The cache issues are so important with current processors that I'd suggest throwing -Os (optimize for size) into the mix people test. That one may stack usefully with -O2, but probably not with -O3 (3 includes optimizations that increase code size). -Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. So far I have been compiling PostgreSQL and running my pgbench script manually, but this makes me want to modify my script to run pgbench automatically using all possible permutations of a set of compiler flags. Last I tried GCC to produce 32-bit code on this Opteron system, though, it complained about the lack of a compiler; can I persuade it to generate 32-bit code (or 32-bit pointers for that matter) without going the cross-compilation route? Alexander. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] New to PostgreSQL, performance considerations
On Fri, Dec 15, 2006 at 12:24:46PM -0500, Ron wrote: ATM, the most we can say is that in a number of systems with modest physical IO subsystems So I reran it on a 3.2GHz xeon with 6G RAM off a ramdisk; I/O ain't the bottleneck on that one. Results didn't show didn't show any signficant gains regardless of compilation options (results hovered around 12k tps). If people want to continue this, I will point out that they should make sure they're linked against the optimized libpq rather than an existing one elsewhere in the library path. Beyond that, I'm done with this thread. Maybe there are some gains to be found somewhere, but the testing done thus far (while limited) is sufficient, IMO, to demonstrate that compiler options aren't going to provide a blow-your-socks-off dramatic performance improvement. Mike Stone ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] opportunity to benchmark a quad core Xeon
I have the opportunity to benchmark a system is based on Supermicro 6015B-8V. It has 2x Quad Xeon E5320 1.86GHZ, 4GB DDR2 533, 1x 73GB 10k SCSI. http://www.supermicro.com/products/system/1U/6015/SYS-6015B-8V.cfm We can add more RAM and drives for testing purposes. Can someone suggest what benchmarks with what settings would be desirable to see how this system performs. I don't believe I've seen any postgres benchmarks done on a quad xeon yet. -- Jeff Frost, Owner [EMAIL PROTECTED] Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 6: explain analyze is your friend