Re: [PERFORM] Wierd context-switching issue on Xeon
Hi Tom, You still have an account on my Unixware Bi-Xeon hyperthreded machine. Feel free to use it for your tests. On Mon, 19 Apr 2004, Tom Lane wrote: Date: Mon, 19 Apr 2004 20:53:09 -0400 From: Tom Lane [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Joe Conway [EMAIL PROTECTED], scott.marlowe [EMAIL PROTECTED], Bruce Momjian [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Neil Conway [EMAIL PROTECTED] Subject: Re: [PERFORM] Wierd context-switching issue on Xeon I wrote: Here is a test case. Hmmm ... I've been able to reproduce the CS storm on a dual Athlon, which seems to pretty much let the Xeon per se off the hook. Anybody got a multiple Opteron to try? Totally non-Intel CPUs? It would be interesting to see results with non-Linux kernels, too. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: [EMAIL PROTECTED] -- Make your life a dream, make your dream a reality. (St Exupery) ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Deleting certain duplicates
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 So I was thinking maybe of doing the deletion in chunks, perhaps based on reception time. Are there any suggestions for a better way to do this, or using multiple queries to delete selectively a week at a time based on the reception_time. I would say there are a lot of duplicate entries between mid march to the first week of April. You are on the right track, in that dividing up the table will help. However, you cannot divide on the reception_time as that is the unique column. Analyze your data and divide on a row with a fairly uniform distribution over the time period in question. Then copy a segment out, clean it up, and put it back in. Make sure there is an index on the column in question, of course. For example, if 1/10 of the table has a units of 12, you could do something like this: CREATE INDEX units_dev ON forecastelement (units); CREATE TEMPORARY TABLE units_temp AS SELECT * FROM forecastelement WHERE units='12'; CREATE INDEX units_oid_index ON units_temp(oid); (Delete out duplicate rows from units_temp using your previous query or something else) DELETE FROM forecastelement WHERE units='12'; INSERT INTO forecastelement SELECT * FROM units_temp; DELETE FROM units_temp; Repeat as needed until all rows are done. Subsequent runs can be done by doing a INSERT INTO units_temp SELECT * FROM forecastelement WHERE units='...' and skipping the CREATE INDEX steps. On the other hand, your original deletion query may work as is, with the addition of an oid index. Perhaps try an EXPLAIN on it. - -- Greg Sabino Mullane [EMAIL PROTECTED] PGP Key: 0x14964AC8 200404200706 -BEGIN PGP SIGNATURE- iD8DBQFAhQVWvJuQZxSWSsgRAvLEAKDCVcX3Llm8JgszI/BBC1SobtjVawCfVGKu ERcV5J2JolwgZRhMbXnNM90= =JqET -END PGP SIGNATURE- ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Why will vacuum not end?
No, but data is constantly being inserted by userid scores. It is postgres runnimg the vacuum. Dan. -Original Message- From: Christopher Kings-Lynne [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 20, 2004 12:02 AM To: Shea,Dan [CIS] Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Why will vacuum not end? This vacuum is running a marathon. Why will it not end and show me free space map INFO? We have deleted a lot of data and I would like to be confident that these deletions will be used as free space, rather than creating more table files. Does another postgres query running have a lock on that table? Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Wierd context-switching issue on Xeon
Dual Athlon With one process running 30 cs/second with two process running 15000 cs/second Dave On Tue, 2004-04-20 at 08:46, Jeff wrote: On Apr 19, 2004, at 8:01 PM, Tom Lane wrote: [test case] Quad P3-700Mhz, ServerWorks, pg 7.4.2 - 1 process: 10-30 cs / second 2 process: 100k cs / sec 3 process: 140k cs / sec 8 process: 115k cs / sec Dual P2-450Mhz, non-serverworks (piix) - 1 process 15-20 / sec 2 process 30k / sec 3 (up to 7) process: 15k /sec (Yes, I verified with more processes the cs's drop) And finally, 6 cpu sun e4500, solaris 2.6, pg 7.4.2: 1 - 10 processes: hovered between 2-3k cs/second (there was other stuff running on the machine as well) Verrry interesting. I've got a dual G4 at home, but for convenience Apple doesn't ship a vmstat that tells context switches -- Jeff Trout [EMAIL PROTECTED] http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html !DSPAM:40851da1199651145780980! -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Why will vacuum not end?
Shea,Dan [CIS] wrote: No, but data is constantly being inserted by userid scores. It is postgres runnimg the vacuum. Dan. -Original Message- From: Christopher Kings-Lynne [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 20, 2004 12:02 AM To: Shea,Dan [CIS] Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Why will vacuum not end? This vacuum is running a marathon. Why will it not end and show me free space map INFO? We have deleted a lot of data and I would like to be confident that these deletions will be used as free space, rather than creating more table files. Does another postgres query running have a lock on that table? This may be a dumb question (but only because I don't know the answer) Doesn't/shouldn't vacuum have some kind of timeout so if a table is locked it will give up eventually (loudly complaining when it does so)? -- Bill Moran Potential Technologies http://www.potentialtech.com ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Wierd context-switching issue on Xeon
As a cross-ref to all the 7.4.x tests people have sent in, here's 7.2.3 (Redhat 7.3), Quad Xeon 700MHz/1MB L2 cache, 3GB RAM. Idle-ish (it's a production server) cs/sec ~5000 3 test queries running: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 3 0 0 23380 577680 105912 2145140 0 0 0 0 107 116890 50 14 35 2 0 0 23380 577680 105912 2145140 0 0 0 0 114 118583 50 15 34 2 0 0 23380 577680 105912 2145140 0 0 0 0 107 115842 54 14 32 2 1 0 23380 577680 105920 2145140 0 0 032 156 117549 50 16 35 HTH Matt -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Tom Lane Sent: 20 April 2004 01:02 To: [EMAIL PROTECTED] Cc: Joe Conway; scott.marlowe; Bruce Momjian; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Neil Conway Subject: Re: [PERFORM] Wierd context-switching issue on Xeon Here is a test case. To set up, run the test_setup.sql script once; then launch two copies of the test_run.sql script. (For those of you with more than two CPUs, see whether you need one per CPU to make trouble, or whether two test_runs are enough.) Check that you get a nestloops-with-index-scans plan shown by the EXPLAIN in test_run. In isolation, test_run.sql should do essentially no syscalls at all once it's past the initial ramp-up. On a machine that's functioning per expectations, multiple copies of test_run show a relatively low rate of semop() calls --- a few per second, at most --- and maybe a delaying select() here and there. What I actually see on Josh's client's machine is a context swap storm: vmstat 1 shows CS rates around 170K/sec. strace'ing the backends shows a corresponding rate of semop() syscalls, with a few delaying select()s sprinkled in. top(1) shows system CPU percent of 25-30 and idle CPU percent of 16-20. I haven't bothered to check how long the test_run query takes, but if it ends while you're still examining the behavior, just start it again. Note the test case assumes you've got shared_buffers set to at least 1000; with smaller values, you may get some I/O syscalls, which will probably skew the results. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Index Problem?
Ron St-Pierre wrote: I am using postgres 7.4.1 and have a problem with a plpgsql function. When I run the function on the production server it takes approx 33 minutes to run. I dumped the DB and copied it to a similarly configured box and ran the function and it ran in about 10 minutes. Can anyone offer advice on tuning the function or my database? Here are the lengthy, gory details. F u n c t i o n It updates seven columns of a table 1 to 4 times daily. Current data = 42,000 rows, new data = 30,000 rows. CREATE TYPE employeeType AS (empID INTEGER, updateDate DATE, bDate INTEGER, val1 NUMERIC, val2 NUMERIC, val3 NUMERIC, val4 NUMERIC, favNum NUMERIC); CREATE OR REPLACE FUNCTION updateEmployeeData() RETURNS SETOF employeeType AS ' DECLARE rec RECORD; BEGIN FOR rec IN SELECT empID, updateDate, bDate, val1, val2, val3, val4, favNum FROM newData LOOP RETURN NEXT rec; UPDATE currentData SET val1=rec.val1, val2=rec.val2, val3=rec.val2, val4=rec.val4, favNum=rec.favNum, updateDate=rec.updateDate WHERE empID=rec.empID; END LOOP; RETURN; END; ' LANGUAGE 'plpgsql'; Can't you handle this with a simple update query? UPDATE currentData SET val1 = newData.val1, val2 = newData.val2, val3 = newData.val3, val4 = newData.val4, favNum = newData.favNum, updateDate = newData.updateDate FROM newData WHERE newDate.empID = currentData.empID Jochem -- I don't get it immigrants don't work and steal our jobs - Loesje ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Wierd context-switching issue on Xeon
Hi Tom, Just to explain our hardware situation releated to the FSB of the XEON's. We have older XEON DP in operation with FSB 400 and 2.4 GHz. The XEON MP box runs with 2.5 GHz. The XEON MP box is a Fujitsu Siemens Primergy RX600 with ServerWorks GC LE as chipset. The box, which Dirk were use to compare the behavior, is our newest XEON DP system. This XEON DP box runs with 2.8 GHz and FSB 533 using the Intel 7501 chipset (Supermicro). I would agree to Jush. When PostgreSQL has an issue with the INTEL XEON MP hardware, this is more releated to the chipset. Back to the SQL-Level. We use SELECT FOR UPDATE as semaphore. Should we try another implementation for this semahore on the client side to prevent this issue? Regards Sven. - Original Message - From: Tom Lane [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Josh Berkus [EMAIL PROTECTED]; [EMAIL PROTECTED]; Neil Conway [EMAIL PROTECTED] Sent: Sunday, April 18, 2004 11:47 PM Subject: Re: [PERFORM] Wierd context-switching issue on Xeon After some further digging I think I'm starting to understand what's up here, and the really fundamental answer is that a multi-CPU Xeon MP box sucks for running Postgres. I did a bunch of oprofile measurements on a machine belonging to one of Josh's clients, using a test case that involved heavy concurrent access to a relatively small amount of data (little enough to fit into Postgres shared buffers, so that no I/O or kernel calls were really needed once the test got going). I found that by nearly any measure --- elapsed time, bus transactions, or machine-clear events --- the spinlock acquisitions associated with grabbing and releasing the BufMgrLock took an unreasonable fraction of the time. I saw about 15% of elapsed time, 40% of bus transactions, and nearly 100% of pipeline-clear cycles going into what is essentially two instructions out of the entire backend. (Pipeline clears occur when the cache coherency logic detects a memory write ordering problem.) I am not completely clear on why this machine-level bottleneck manifests as a lot of context swaps at the OS level. I think what is happening is that because SpinLockAcquire is so slow, a process is much more likely than you'd normally expect to arrive at SpinLockAcquire while another process is also acquiring the spinlock. This puts the two processes into a lockstep condition where the second process is nearly certain to observe the BufMgrLock as locked, and be forced to suspend itself, even though the time the first process holds the BufMgrLock is not really very long at all. If you google for Xeon and cache coherency you'll find quite a bit of suggestive information about why this might be more true on the Xeon setup than others. A couple of interesting hits: http://www.theinquirer.net/?article=10797 says that Xeon MP uses a *slower* FSB than Xeon DP. This would translate directly to more time needed to transfer a dirty cache line from one processor to the other, which is the basic operation that we're talking about here. http://www.aceshardware.com/Spades/read.php?article_id=3187 says that Opterons use a different cache coherency protocol that is fundamentally superior to the Xeon's, because dirty cache data can be transferred directly between two processor caches without waiting for main memory. So in the short term I think we have to tell people that Xeon MP is not the most desirable SMP platform to run Postgres on. (Josh thinks that the specific motherboard chipset being used in these machines might share some of the blame too. I don't have any evidence for or against that idea, but it's certainly possible.) In the long run, however, CPUs continue to get faster than main memory and the price of cache contention will continue to rise. So it seems that we need to give up the assumption that SpinLockAcquire is a cheap operation. In the presence of heavy contention it won't be. One thing we probably have got to do soon is break up the BufMgrLock into multiple finer-grain locks so that there will be less contention. However I am wary of doing this incautiously, because if we do it in a way that makes for a significant rise in the number of locks that have to be acquired to access a buffer, we might end up with a net loss. I think Neil Conway was looking into how the bufmgr might be restructured to reduce lock contention, but if he had come up with anything he didn't mention exactly what. Neil? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Toooo many context switches (maybe SLES8?)
Could this be related to the O(1) scheduler backpatches from 2.6 to 2.4 kernel on newer 2.4er distros (RedHat, SuSE)? Tom Lane wrote: Joe Conway [EMAIL PROTECTED] writes: Improve spinlock code for recent x86 processors: insert a PAUSE instruction in the s_lock() wait loop, and use test before test-and-set in TAS() macro to avoid unnecessary bus traffic. Patch from Manfred Spraul, reworked a bit by Tom. I thought this had been committed to the 7.4 stable branch as well, but it appears not. I am currently chasing what seems to be the same issue: massive context swapping on a dual Xeon system. I tried back-patching the above-mentioned patch ... it helps a little but by no means solves the problem ... regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] possible improvement between G4 and G5
There are a few things that you can do to help force yourself to be I/O bound. These include: - RAID 5 for write intensive applications, since multiple writes per synch write is good. (There is a special case for logging or other streaming sequential writes on RAID 5) - Data journaling file systems are helpful in stress testing your checkpoints - Using midsized battery backed up write through buffering controllers. In general, if you have a small cache, you see the problem directly, and a huge cache will balance out load and defer writes to quieter times. That is why a midsized cache is so useful in showing stress in your system only when it is being stressed. Only partly in jest, /Aaron BTW - I am truly curious about what happens to your system if you use separate RAID 0+1 for your logs, disk sorts, and at least the most active tables. This should reduce I/O load by an order of magnitude. Vivek Khera [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] JB == Josh Berkus [EMAIL PROTECTED] writes: JB Aaron, I do consulting, so they're all over the place and tend to be complex. Very few fit in RAM, but still are very buffered. These are almost all backed with very high end I/O subsystems, with dozens of spindles with battery backed up writethrough cache and gigs of buffers, which may be why I worry so much about CPU. I have had this issue with multiple servers. JB Aha, I think this is the difference. I never seem to be able to JB get my clients to fork out for adequate disk support. They are JB always running off single or double SCSI RAID in the host server; JB not the sort of setup you have. Even when I upgraded my system to a 14-spindle RAID5 with 128M cache and 4GB RAM on a dual Xeon system, I still wind up being I/O bound quite often. I think it depends on what your working set turns out to be. My workload really spans a lot more of the DB than I can end up caching. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] sunquery and estimated rows
On Sun, 2004-04-18 at 19:09, Tom Lane wrote: Markus Bertheau [EMAIL PROTECTED] writes: , 17.04.2004, 01:45, Tom Lane : The planner sees that as where scope = some complicated expression and falls back to a default estimate. It won't simplify a sub-select to a constant. (Some people consider that a feature ;-).) Why? It's the only way to prevent it from simplifying when you don't want it to. I'm having a difficult time coming up with a circumstance where that is beneficial except when stats are out of whack. Doesn't a prepared statement also falls back to the default estimate for variables. -- Rod Taylor rbt [at] rbt [dot] ca Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/signature.asc signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Toooo many context switches (maybe SLES8?)
Don't think so, mine is a vanilla kernel from kernel.org Dave On Thu, 2004-04-15 at 16:03, Dirk Lutzebaeck wrote: Could this be related to the O(1) scheduler backpatches from 2.6 to 2.4 kernel on newer 2.4er distros (RedHat, SuSE)? Tom Lane wrote: Joe Conway [EMAIL PROTECTED] writes: Improve spinlock code for recent x86 processors: insert a PAUSE instruction in the s_lock() wait loop, and use test before test-and-set in TAS() macro to avoid unnecessary bus traffic. Patch from Manfred Spraul, reworked a bit by Tom. I thought this had been committed to the 7.4 stable branch as well, but it appears not. I am currently chasing what seems to be the same issue: massive context swapping on a dual Xeon system. I tried back-patching the above-mentioned patch ... it helps a little but by no means solves the problem ... regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly !DSPAM:408535ce93801252113544! -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Use of subquery causes seq scan???
I need some help. I have a query that refuses to use the provided index and is always sequentially scanning causing me large performance headaches. Here is the basic situation: Table A: inv_num int typechar . . . pkey (inv_num, type) indx(inv_num) Table B (has the same primary key) Select * from table a where inv_num in (select inv_num from table b where ) Doing this causes sequential scans of both tables. If I do a set enable_seqscan to false before the query, I get an index scan of table b but still seq scan table a. Is there anyway to force table a to use this index (or another) and not sequentially scan the table? I'm running 7.3.4 on RedHat EL 2.1. Thanks, Chris ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Wierd context-switching issue on Xeon
Dirk Lutzebaeck wrote: c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro) performs well and I could not observe context switch peaks here (one user active), almost no extra semop calls Did Tom's test here: with 2 processes I'll reach 200k+ CS with peaks to 300k CS. Bummer.. Josh, I don't think you can bash the ServerWorks chipset here nor bigmem. Dirk ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Wierd context-switching issue on Xeon
I tried to test how this is related to cache coherency, by forcing affinity of the two test_run.sql processes to the two cores (pipelines? threads) of a single hyperthreaded xeon processor in an smp xeon box. When the processes are allowed to run on distinct chips in the smp box, the CS storm happens. When they are bound to the two cores of a single hyperthreaded Xeon in the smp box, the CS storm *does* happen. I used the taskset command: taskset 01 -p pid for backend of test_run.sql 1 taskset 01 -p pid for backend of test_run.sql 1 I guess that 0 and 1 are the two cores (pipelines? hyper-threads?) on the first Xeon processor in the box. I did this on RedHat Fedora core1 on an intel motherboard (I'll get the part no if it matters) during storms : 300k CS/sec, 75% idle (on a dual xeon (four core)) machine (suggesting serializing/sleeping processes) no storm: 50k CS/sec, 50% idle (suggesting 2 cpu bound processes) Maybe there's a hot block that is bouncing back and forth between caches? or maybe the page holding semaphores? On Apr 19, 2004, at 5:53 PM, Tom Lane wrote: I wrote: Here is a test case. Hmmm ... I've been able to reproduce the CS storm on a dual Athlon, which seems to pretty much let the Xeon per se off the hook. Anybody got a multiple Opteron to try? Totally non-Intel CPUs? It would be interesting to see results with non-Linux kernels, too. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Horribly slow hash join
Dammit, I somehow deleted a bunch of replies to this. Did a TODO ever come out of this? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Moving postgres to FC disks
I am planning to move the pg databases from the internal RAID to external Fiber Channel over SAN. Question is -With the db size being as big as, say, 30+GB, how do I move it on the new logical drive? (stop postgresql, and simply move it over somehow and make a link?) -Currently, the internal RAID volume is ext3 filesystem. Any recommendations for the filesystem on the new FC volume? Rieserfs? DBs are 7.4.1(RH9), and 7.2.3 (RH8). Appreciate any pointers. Thanks, Anjan
Re: [PERFORM] Use of subquery causes seq scan???
Chris Hoover [EMAIL PROTECTED] writes: Select * from table a where inv_num in (select inv_num from table b where ) I'm running 7.3.4 on RedHat EL 2.1. IN (SELECT) constructs pretty well suck in PG releases before 7.4. Update, or consult the FAQ about rewriting into an EXISTS form. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Wierd context-switching issue on Xeon
Ooops, what I meant to say was that 2 threads bound to one (hyperthreaded) cpu does *NOT* cause the storm, even on an smp xeon. Therefore, the context switches may be a result of cache coherency related delays. (2 threads on one hyperthreaded cpu presumably have tightly coupled 1,l2 cache.) On Apr 20, 2004, at 1:02 PM, Paul Tuckfield wrote: I tried to test how this is related to cache coherency, by forcing affinity of the two test_run.sql processes to the two cores (pipelines? threads) of a single hyperthreaded xeon processor in an smp xeon box. When the processes are allowed to run on distinct chips in the smp box, the CS storm happens. When they are bound to the two cores of a single hyperthreaded Xeon in the smp box, the CS storm *does* happen. er, meant *NOT HAPPEN* I used the taskset command: taskset 01 -p pid for backend of test_run.sql 1 taskset 01 -p pid for backend of test_run.sql 1 I guess that 0 and 1 are the two cores (pipelines? hyper-threads?) on the first Xeon processor in the box. I did this on RedHat Fedora core1 on an intel motherboard (I'll get the part no if it matters) during storms : 300k CS/sec, 75% idle (on a dual xeon (four core)) machine (suggesting serializing/sleeping processes) no storm: 50k CS/sec, 50% idle (suggesting 2 cpu bound processes) Maybe there's a hot block that is bouncing back and forth between caches? or maybe the page holding semaphores? On Apr 19, 2004, at 5:53 PM, Tom Lane wrote: I wrote: Here is a test case. Hmmm ... I've been able to reproduce the CS storm on a dual Athlon, which seems to pretty much let the Xeon per se off the hook. Anybody got a multiple Opteron to try? Totally non-Intel CPUs? It would be interesting to see results with non-Linux kernels, too. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Wierd context-switching issue on Xeon
Dirk, Tom, OK, off IRC, I have the following reports: Linux 2.4.21 or 2.4.20 on dual Pentium III : problem verified Linux 2.4.21 or 2.4.20 on dual Penitum II : problem cannot be reproduced Solaris 2.6 on 6 cpu e4500 (using 8 processes) : problem not reproduced -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Use of subquery causes seq scan???
Please don't reply to messages to start new threads. On Tue, Apr 20, 2004 at 10:20:05 -0400, Chris Hoover [EMAIL PROTECTED] wrote: I need some help. I have a query that refuses to use the provided index and is always sequentially scanning causing me large performance headaches. Here is the basic situation: Table A: inv_num int type char . . . pkey (inv_num, type) indx(inv_num) Table B (has the same primary key) Select * from table a where inv_num in (select inv_num from table b where ) Doing this causes sequential scans of both tables. If I do a set enable_seqscan to false before the query, I get an index scan of table b but still seq scan table a. Is there anyway to force table a to use this index (or another) and not sequentially scan the table? I'm running 7.3.4 on RedHat EL 2.1. IN was slow in 7.3.x and before. The query will probably run much better as is in 7.4 and above. In 7.3 you want to rewrite it as a join or using EXISTS. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Moving postgres to FC disks
-With the db size being as big as, say, 30+GB, how do I move it on the new logical drive? (stop postgresql, and simply move it over somehow and make a link?) I would stop the database, move the data directory to the new volume using rsync then start up postgresql pointed at the new data directory. Providing everything is working correctly you can then remove the old data directory. -Currently, the internal RAID volume is ext3 filesystem. Any recommendations for the filesystem on the new FC volume? Rieserfs? XFS DBs are 7.4.1(RH9), and 7.2.3 (RH8). Appreciate any pointers. Thanks, Anjan -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Re: [PERFORM] 225 times slower
Hi, I apologize for the mistake. So, I dump the database, I reload it then VACUUM ANALYZE. For each statement: I then quit postgres, start it, execute one command, then quit. Le 14 avr. 04, à 14:39, Pailloncy Jean-Gérard a écrit : dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=764518963 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1; QUERY PLAN Limit (cost=169.79..169.79 rows=1 width=4) (actual time=502.397..502.398 rows=1 loops=1) - Sort (cost=169.79..169.86 rows=30 width=4) (actual time=502.393..502.393 rows=1 loops=1) Sort Key: rec_id - Index Scan using url_crc on url (cost=0.00..169.05 rows=30 width=4) (actual time=43.545..490.895 rows=56 loops=1) Index Cond: (crc32 = 764518963) Filter: ((crc32 0) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 502.520 ms (7 rows) dps=# \q dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=764518963 AND status IN (200,304,206) ORDER BY crc32,rec_id LIMIT 1; QUERY PLAN - Limit (cost=169.79..169.79 rows=1 width=8) (actual time=5.893..5.894 rows=1 loops=1) - Sort (cost=169.79..169.86 rows=30 width=8) (actual time=5.889..5.889 rows=1 loops=1) Sort Key: crc32, rec_id - Index Scan using url_crc on url (cost=0.00..169.05 rows=30 width=8) (actual time=0.445..5.430 rows=56 loops=1) Index Cond: (crc32 = 764518963) Filter: ((crc32 0) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 6.020 ms (7 rows) dps=# \q dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=419903683 AND status IN (200,304,206) ORDER BY rec_id LIMIT 1; QUERY PLAN -- Limit (cost=0.00..27.95 rows=1 width=4) (actual time=11021.875..11021.876 rows=1 loops=1) - Index Scan using url_pkey on url (cost=0.00..11625.49 rows=416 width=4) (actual time=11021.868..11021.868 rows=1 loops=1) Filter: ((crc32 0) AND (crc32 = 419903683) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 11021.986 ms (4 rows) dps=# \q dps=# explain analyze SELECT rec_id FROM url WHERE crc32!=0 AND crc32=419903683 AND status IN (200,304,206) ORDER BY crc32,rec_id LIMIT 1; QUERY PLAN - Limit (cost=2000.41..2000.41 rows=1 width=8) (actual time=48.503..48.504 rows=1 loops=1) - Sort (cost=2000.41..2001.45 rows=416 width=8) (actual time=48.499..48.499 rows=1 loops=1) Sort Key: crc32, rec_id - Index Scan using url_crc on url (cost=0.00..1982.31 rows=416 width=8) (actual time=4.848..45.452 rows=796 loops=1) Index Cond: (crc32 = 419903683) Filter: ((crc32 0) AND ((status = 200) OR (status = 304) OR (status = 206))) Total runtime: 48.656 ms (7 rows) dps=# \q So, with all fresh data, everything rebuild from scratch, on a backend that will done one and only one query, the results is strange. Why adding an ORDER BY clause on a column with one value speed up the stuff 502ms to 6ms ? Why when crc32=419903683, which is one of the most often used value in the table, the query planner chose a plan so bad (225 times slower) ? Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Wierd context-switching issue on Xeon
I verified problem on a Dual Opteron server. I temporarily killed the normal load, so the server was largely idle when the test was run. Hardware: 2x Opteron 242 Rioworks HDAMA server board 4Gb RAM OS Kernel: RedHat9 + XFS 1 proc: 10-15 cs/sec 2 proc: 400,000-420,000 cs/sec j. andrew rogers ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Why will vacuum not end?
No, but data is constantly being inserted by userid scores. It is postgres runnimg the vacuum. Dan. Well, inserts create some locks - perhaps that's the problem... Otherwise, check the pg_locks view to see if you can figure it out. Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Wierd context-switching issue on Xeon
If this helps - Quad 2.0GHz XEON with highest load we have seen on the applications, DB performing great - procs memory swap io system cpu r b w swpd free buff cache si sobibo incs us sy id 1 0 0 1616 351820 66144 1081370400 2 01 1 0 2 7 3 0 0 1616 349712 66144 1081373600 8 1634 1362 4650 4 2 95 0 0 0 1616 347768 66144 1081412000 188 1218 1158 4203 5 1 93 0 0 1 1616 346596 66164 1081418400 8 1972 1394 4773 4 1 94 2 0 1 1616 345424 66164 108142720020 1392 1184 4197 4 2 94 Around 4k CS/sec Chipset is Intel ServerWorks GC-HE. Linux Kernel 2.4.20-28.9bigmem #1 SMP Thanks, Anjan -Original Message- From: Dirk Lutzebäck [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 20, 2004 10:29 AM To: Tom Lane; Josh Berkus Cc: [EMAIL PROTECTED]; Neil Conway Subject: Re: [PERFORM] Wierd context-switching issue on Xeon Dirk Lutzebaeck wrote: c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro) performs well and I could not observe context switch peaks here (one user active), almost no extra semop calls Did Tom's test here: with 2 processes I'll reach 200k+ CS with peaks to 300k CS. Bummer.. Josh, I don't think you can bash the ServerWorks chipset here nor bigmem. Dirk ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Wierd context-switching issue on Xeon
Dirk Lutzebäck wrote: Dirk Lutzebaeck wrote: c) Dual XEON DP, non-bigmem, HT on, E7500 Intel chipset (Supermicro) performs well and I could not observe context switch peaks here (one user active), almost no extra semop calls Did Tom's test here: with 2 processes I'll reach 200k+ CS with peaks to 300k CS. Bummer.. Josh, I don't think you can bash the ServerWorks chipset here nor bigmem. Dave Cramer reproduced the problem on my SuperMicro dual Xeon on BSD/OS. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Wierd context-switching issue on Xeon
Anjan, Quad 2.0GHz XEON with highest load we have seen on the applications, DB performing great - Can you run Tom's test? It takes a particular pattern of data access to reproduce the issue. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Persistent Connections
Randolf Richardson wrote: [EMAIL PROTECTED] (Nick Barr) stated in comp.databases.postgresql.performance: [EMAIL PROTECTED] wrote: [sNip] Sorry I m a little bit confused about the persistent thing!! Is it smart to use persistent connections at all if i expect 100K Users to hit the script in an hour and the script calls up to 10-15 pg functions? I have at the mom one function but the server needs 500 ms, its a little bit too much i think, and it crashed when i had 20K users Use the persistent connection but make sure the parameters in postgresql.conf match up with the Apache config. The specific settings are MaxClients in httpd.conf and max_connections in postgresql.conf. Make sure that max_connections is at least as big as MaxClients for every database that your PHP scripts connect to. Do you happen to have (or know where to get) some sample configuration files for Apache 2 and PostgreSQL for this? The documentation I've found so far is pretty sparse, and sample files would be very helpful. Beware that persistent connections in PHP behave a little differently than you would think.The connections stays open between an apache process and postgres. So each process has its own connection and you may not hit the same process on each request to the apache server. Temporary tables are not dropped automatically between refreshes on persistent connections. An example of this is to enable persistent connections and execute CREATE TEMPORARY TABLE foo ( id INTEGER ); $conn = pg_pconnect( ... ); if (!$result = pg_query($conn, CREATE TEMPORARY TABLE tmp_foo ( id INTEGER );)) { echo pg_result_error($result) ; } else { echo created ok!; } After a couple of refreshes you will get an error that states the table already exists. This was a pain to learn, especially while I was doing these operations inside of transactions. On most of my servers the connect time for postgresql was 6ms or less, so I disabled persistent connections altogether so that I could be assured that temporary tables and all php launched postgresql sessions were properly reset. As far as I know, there is no way to reset the sesssion ( cleaning up temporary tables, etc ) automatically with an SQL statement without closing the connection ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] seeking consultant for high performance, complex searching
Have you checked Tsearch2 http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ is the most feature rich Full text Search system available for postgresql. We are also using the same system in the revamped version of our website. Regds Mallah. Mark Stosberg wrote: Hello, I work for Summersault, LLC. We've been using Postgres since the days of Postgres 6.5. We're focused on building database-driven websites using Perl and Postgres. We are currently seeking help developing a search system that needs to perform complex queries with high performance. Although we have strong skills in Perl and Postgres, we are new to the arena of complex, high-performance search systems. We are seeking to hire a consultant to help this as part of the re-vamp of the 1-800-Save-A-Pet.com website. 1-800-Save-A-Pet.com is a not-for-profit organization whose website finds homes for homeless pets, promoting pet adoption and saving thousands of animal lives. Summersault, LLC is a website development firm focused on creating highly customized database driven websites. The ideal consultant has expert experience with the PostgreSQL RDBMS and the Perl programming language, and is intimately familiar with the architecture and implementation of complex database queries for high-traffic web applications. The consultant should also have a strong background in creating solutions complementary to this work, e.g. assessing hardware requirements, designing a hosting and network infrastructure, and optimizing the algorithm based on real-world feedback. The consultant will work with Summersault developers as a part of a larger application development process. Interested persons or organizations should contact Chris Hardie of Summersault, LLC at [EMAIL PROTECTED] for more information. Thanks! Mark ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Wierd context-switching issue on Xeon
I modified the code in s_lock.c to remove the spins #define SPINS_PER_DELAY 1 and it doesn't exhibit the behaviour This effectively changes the code to while(TAS(lock)) select(1); // 10ms Can anyone explain why executing TAS 100 times would increase context switches ? Dave On Tue, 2004-04-20 at 12:59, Josh Berkus wrote: Anjan, Quad 2.0GHz XEON with highest load we have seen on the applications, DB performing great - Can you run Tom's test? It takes a particular pattern of data access to reproduce the issue. -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Moving postgres to FC disks
I agree on not linking and adding non-SAN disk dependancy to your DB. I'm trying to understand your FS reasoning. I have never seen XFS run faster than ReiserFS in any situation (or for that matter beat any FS in performance except JFS). XFS has some nifty very large file features, but we're talking about 30G and all modern FSs support 2G files. My tendancy would be to stay on ext3, since it is the default RH FS. I would review site preference and the SAN recommended FS and see if they add any compelling points. /Aaron - Original Message - From: Joshua D. Drake To: Anjan Dave Cc: [EMAIL PROTECTED] Sent: Tuesday, April 20, 2004 8:27 PM Subject: Re: [PERFORM] Moving postgres to FC disks -With the db size being as big as, say, 30+GB, how do I move it on the new logical drive? (stop postgresql, and simply move it over somehow and make a link?)I would stop the database, move the data directory to the new volume using rsync then start up postgresql pointed at the new data directory.Providing everything is working correctly you can then remove the old data directory. -Currently, the internal RAID volume is ext3 filesystem. Any recommendations for the filesystem on the new FC volume? Rieserfs? XFS DBs are 7.4.1(RH9), and 7.2.3 (RH8). Appreciate any pointers. Thanks,Anjan-- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
[PERFORM] pgbench written in Pro*C
I received a copy of pgbench rewritten in Pro*C, which is similar to embedded C. I think it was done so the same program could be tested on Oracle and PostgreSQL. Are folks interested in this code? Should it be put on gborg or in our /contrib/pgbench? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] pgbench written in Pro*C
Bruce Momjian [EMAIL PROTECTED] writes: I received a copy of pgbench rewritten in Pro*C, which is similar to embedded C. I think it was done so the same program could be tested on Oracle and PostgreSQL. Are folks interested in this code? Should it be put on gborg or in our /contrib/pgbench? If it requires non-free tools even to build, it is of no value. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Wierd context-switching issue on Xeon
Joe Conway wrote: In isolation, test_run.sql should do essentially no syscalls at all once it's past the initial ramp-up. On a machine that's functioning per expectations, multiple copies of test_run show a relatively low rate of semop() calls --- a few per second, at most --- and maybe a delaying select() here and there. Here's results for 7.4 on a dual Athlon server running fedora core: CPU states: cpuusernice systemirq softirq iowaitidle total 86.0%0.0% 52.4% 0.0% 0.0%0.0% 61.2% cpu00 37.6%0.0% 29.7% 0.0% 0.0%0.0% 32.6% cpu01 48.5%0.0% 22.7% 0.0% 0.0%0.0% 28.7% procs memory swap io system cpu r b swpd free buff cache si sobibo incs 1 0 120448 25764 48300 109457600 0 124 170 187 1 0 120448 25780 48300 109457600 0 0 15289 2 0 120448 25744 48300 109458000 060 141 78290 2 0 120448 25752 48300 109458000 0 0 131 140326 2 0 120448 25756 48300 109457600 040 122 140100 2 0 120448 25764 48300 109458400 060 133 136595 2 0 120448 24284 48300 109458400 0 200 138 135151 The jump in cs corresponds to starting the query in the second session. Joe ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] pgbench written in Pro*C
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: I received a copy of pgbench rewritten in Pro*C, which is similar to embedded C. I think it was done so the same program could be tested on Oracle and PostgreSQL. Are folks interested in this code? Should it be put on gborg or in our /contrib/pgbench? If it requires non-free tools even to build, it is of no value. OK, it's only value would be if we could modify it so it compiled using our ecpg and Pro*C and the comparison program could be run on both databases. I will tell the submitter to put it on gborg if they wish. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])