Re: [PERFORM] Postgresql is very slow
On Monday 23 June 2008 07:06:54 bijayant kumar wrote: Hello to list, We have a CentOS-5 server with postgresql-8.1.8 installed. I am struggling with postgresql performance. Any query say select * from tablename takes 10-15 mins to give the output, and while executing the query system loads goes up like anything. After the query output, system loads starts decresing. Sounds like a vacuum problem. Any query select,insert,update simple or complex behaves in the same way, what i have explained above. System Specification: OS :- CentOs 5 Postgresql 8.1.8 RAM :- 1 GB SWAP 2 GB Some relevent part(uncommented) of my /var/lib/pgsql/data/postgresql.conf listen_addresses = 'localhost' max_connections = 100 shared_buffers = 1000 You shared_buffers seems low. The one more strange thing is that with the same setting on another server, postgresql is running very smooth. I had run vacum also some times back. You are aware that vacuum is supposed to be an ongoing maintenance activity, right? Please help me out and let me know if you need any other information. Thanks Regards, Bijayant Kumar Send instant messages to your online friends http://uk.messenger.yahoo.com jan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
On Thursday 15 May 2008 03:02:19 Tom Lane wrote: Jan de Visser [EMAIL PROTECTED] writes: Obviously, this issue is tied to the slow count(*) one, as I found out the hard way. Consider the following scenario: * Insert row * Update that row a couple of times * Rinse and repeat many times Now somewhere during that cycle, do a select count(*) just to see where you are. You will be appalled by how slow that is, due to not only the usual 'slow count(*)' reasons. This whole hint bit business makes it even worse, as demonstrated by the fact that running a vacuum before the count(*) makes the latter noticably faster. Uh, well, you can't blame that entirely on hint-bit updates. The vacuum has simply *removed* two-thirds of the rows in the system, resulting in a large drop in the number of rows that the select even has to look at. It's certainly true that hint-bit updates cost something, but quantifying how much isn't easy. The off-the-cuff answer is to do the select count(*) twice and see how much cheaper the second one is. But there are two big holes in that answer: the first is the possible cache effects from having already read in the pages, and the second is that the follow-up scan gets to avoid the visits to pg_clog that the first scan had to make (which after all is the point of the hint bits). I don't know any easy way to disambiguate the three effects that are at work here. But blaming it all on the costs of writing out hint-bit updates is wrong. regards, tom lane True. But it still contributes to the fact that queries sometimes behave in a non-deterministic way, which IMHO is the major annoyance when starting to work with pgsql. And contrary to other causes (vacuum, checkpoints) this is woefully underdocumented. jan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
On 5/14/08, Greg Smith [EMAIL PROTECTED] wrote: On Wed, 14 May 2008, Alvaro Herrera wrote: If neither of the bits is set, then the transaction is either in progress (which you can check by examining the list of running transactions in shared memory) or your process is the first one to check (in which case, you need to consult pg_clog to know the status, and you can update the hint bits if you find out a permanent state). So is vacuum helpful here because it will force all that to happen in one batch? To put that another way: if I've run a manual vacuum, is it true that it will have updated all the hint bits to XMIN_COMMITTED for all the tuples that were all done when the vacuum started? From my benchmarking experience: Yes, vacuum helps. See also below. Regarding FAQs, I'm having trouble imagining putting this in the user FAQ; I think it belongs into the developer's FAQ. However, a benchmarker is not going to look there. Maybe we should start a benchmarker's FAQ? On the wiki I've started adding a series of things that are performance-related FAQs. There's three of them mixed in the bottom of http://wiki.postgresql.org/wiki/Frequently_Asked_Questions right now, about slow count(*) and dealing with slow queries. Here the FAQ would be Why am I seeing all these writes when I'm just doing selects on my table?, and if it's mixed in with a lot of other performance related notes people should be able to find it. The answer and suggestions should be simple enough to be useful to a user who just noticed this behavior, while perhaps going into developer land for those who want to know more about the internals. Obviously, this issue is tied to the slow count(*) one, as I found out the hard way. Consider the following scenario: * Insert row * Update that row a couple of times * Rinse and repeat many times Now somewhere during that cycle, do a select count(*) just to see where you are. You will be appalled by how slow that is, due to not only the usual 'slow count(*)' reasons. This whole hint bit business makes it even worse, as demonstrated by the fact that running a vacuum before the count(*) makes the latter noticably faster. jan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Dual core Opterons beating quad core Xeons?
On 12/20/07, Tom Lane [EMAIL PROTECTED] wrote: Scott Marlowe [EMAIL PROTECTED] writes: Well, I'm not even sure if those got applied or were just Tom hacking in the basement or, heck, my fevered imagination. :) For the record, I hack in the attic ... or what I tell the IRS is my third-floor office ... Awesome band name - 'Hacking in the Attic' :) jan ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Question about memory allocations
On Friday 13 April 2007 14:53:53 Carlos Moreno wrote: How does PG take advantage of the available memory? I mean, if I have a machine with, say, 4 or 8GB of memory, how will those GBs would end up being used? They just do?? (I mean, I would find that a vaild answer; On linux the filesystem cache will gobble them up, which means indirectly pgsql profits as well (assuming no other apps poison the fs cache). jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Beginner Question
On Monday 09 April 2007 05:09:53 s d wrote: Hi, I am trying to figure out how to debug a performance problem / use psql explain. The table in question is: # \d word_association; Table public.word_association Column | Type | Modifiers ++ word1 | character varying(128) | not null word2 | character varying(128) | not null count | integer| not null default 0 Indexes: word1_word2_comb_unique unique, btree (word1, word2) word1_hash_index hash (word1) word2_hash_index hash (word2) word_association_count_index btree (count) word_association_index1_1 btree (word1) word_association_index2_1 btree (word2) It has multiple indices since i wanted to see which one the planner choses. # explain select * FROM word_association WHERE (word1 = 'bdss' OR word2 = 'bdss') AND count = 10; QUERY PLAN --- - Bitmap Heap Scan on word_association (cost=11.53..1192.09 rows=155 width=22) Recheck Cond: (((word1)::text = 'bdss'::text) OR ((word2)::text = 'bdss'::text)) Filter: (count = 10) - BitmapOr (cost=11.53..11.53 rows=364 width=0) - Bitmap Index Scan on word_association_index1_1 (cost=0.00..5.79 rows=190 width=0) Index Cond: ((word1)::text = 'bdss'::text) - Bitmap Index Scan on word_association_index2_1 (cost=0.00..5.67 rows=174 width=0) Index Cond: ((word2)::text = 'bdss'::text) (8 rows) The questions: 1. i can undestand where the cost=11.53 came from but where did the 1192.09 come form? The values are in milli right ? 2. the query takes in reality much longer than 1 second. In short, it feels like something is very wrong here (i tried vacuum analyze and it didn't do much diff). any ideas ? You need an index on (word1, word2, count). In your current setup it will have to scan all rows that satisfy word1 and word2 to see if count = 10. jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] App very unresponsive while performing simple update
On Wednesday 31 May 2006 02:29, Brendan Duddridge wrote: We'll probably have to write a process to update the click_count from querying our product_click_history table. How about an insert trigger on product_click_history which updates click_count every say 1 transactions or so? jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] App very unresponsive while performing simple update
On Wednesday 31 May 2006 13:34, Brendan Duddridge wrote: Hi Jan, That sounds like a great idea! How would you control the update to occur only every 10,000 transactions? Is there a trigger setting for that somewhere? I was thinking something like IF count(*) % 1 = 0 then ... do stuff ... end if Problem may be that that may be a bit expensive; maybe better to have a sequence and use the sequence value. Or something like that. Also, maybe you should do the actual update of click_count not in the trigger itself, but have the trigger do a NOTIFY and have another process do a LISTEN. Depends how long the update takes. jan Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On May 31, 2006, at 6:34 AM, Jan de Visser wrote: On Wednesday 31 May 2006 02:29, Brendan Duddridge wrote: We'll probably have to write a process to update the click_count from querying our product_click_history table. How about an insert trigger on product_click_history which updates click_count every say 1 transactions or so? jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query runs 38 seconds for small database!
On Monday 08 May 2006 14:10, Andrus wrote: The only reason for being so conservative that I'm aware of was that it was a best guess. Everyone I've talked to cuts the defaults down by at least a factor of 2, sometimes even more. Can we ask that Tom will change default values to 2 times smaller in 8.1.4 ? BTW, these parameters are already tweaked from what we started with in contrib/pg_autovacuum. It would allow a table to grow to 2x larger than it should be before vacuuming, as opposed to the 40% that the current settings allow. But even there, is there any real reason you want to have 40% bloat? To make matters worse, those settings ensure that all but the smallest databases will suffer runaway bloat unless you bump up recprd the FSM settings. I created empty table konto and loaded more that 219 records to it during database creation. So it seems that if table grows from zero to more than 219 times larger then it was still not processed. That's because you need at least 500 rows for analyze and 100 for a vacuum, (autovacuum_vacuum_threshold = 1000, autovacuum_analyze_threshold = 500). Andrus. jan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance Issues on Opteron Dual Core
On Wednesday 03 May 2006 03:29, Magnus Hagander wrote: FWIW, I've found problems running PostgreSQL on Windows in a multi-CPU environment on w2k3. It runs fine for some period, and then CPU and throughput drop to zero. So far I've been unable to track down any more information than that, other than the fact that I haven't been able to reproduce this on any single-CPU machines. I have had previous correspondence about this with Magnus (search -general and -hackers). If you uninstall SP1 the problem goes away. We played a bit with potential fixes but didn't find any. Interesting; does SP2 fix the problem? Anything we can do over here to help? There is no SP2 for Windows 2003. That's what I thought. Jim confused me there for a minute. Have you tried this with latest-and-greatest CVS HEAD? Meaning with the new semaphore code that was committed a couple of days ago? No I haven't. Worth a test on a rainy afternoon I'd say... //Magnus jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance Issues on Opteron Dual Core
On Tuesday 02 May 2006 16:28, Jim C. Nasby wrote: On Sun, Apr 30, 2006 at 10:59:56PM +1200, Mark Kirkwood wrote: Pgadmin can give misleading times for queries that return large result sets over a network, due to: 1/ It takes time to format the (large) result set for display. 2/ It has to count the time spent waiting for the (large) result set to travel across the network. You aren't running Pgadmin off the dev server are you? If not check your network link to dev and prod - is one faster than the other? (etc). To eliminate Pgadmin and the network as factors try wrapping your query in a 'SELECT count(*) FROM (your query here) AS a', and see if it changes anything! FWIW, I've found problems running PostgreSQL on Windows in a multi-CPU environment on w2k3. It runs fine for some period, and then CPU and throughput drop to zero. So far I've been unable to track down any more information than that, other than the fact that I haven't been able to reproduce this on any single-CPU machines. I have had previous correspondence about this with Magnus (search -general and -hackers). If you uninstall SP1 the problem goes away. We played a bit with potential fixes but didn't find any. jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Slow SELECTS after large update cycle
Hello, After fixing the hanging problems I reported here earlier (by uninstalling W2K3 SP1), I'm running into another weird one. After doing a +/- 8hr cycle of updates and inserts (what we call a 'batch'), the first 'reporting' type query on tables involved in that write cycle is very slow. As an example, I have a query which according to EXPLAIN ANALYZE takes about 1.1s taking 46s. After this one hit, everything is back to normal, and subsequent executions of the same query are in fact subsecond. Restarting the appserver and pgsql does not make the slowness re-appear, only running another batch will. During the 'write'/batch cycle, a large number of rows in various tables are inserted and subsequently (repeatedly) updated. The reporting type queries after that are basically searches on those tables. Anybody any ideas? Thanks, jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow SELECTS after large update cycle
On Wednesday 15 March 2006 18:21, Simon Riggs wrote: On Wed, 2006-03-15 at 14:39 -0500, Jan de Visser wrote: After fixing the hanging problems I reported here earlier (by uninstalling W2K3 SP1), I'm running into another weird one. After doing a +/- 8hr cycle of updates and inserts (what we call a 'batch'), the first 'reporting' type query on tables involved in that write cycle is very slow. As an example, I have a query which according to EXPLAIN ANALYZE takes about 1.1s taking 46s. After this one hit, everything is back to normal, and subsequent executions of the same query are in fact subsecond. Restarting the appserver and pgsql does not make the slowness re-appear, only running another batch will. During the 'write'/batch cycle, a large number of rows in various tables are inserted and subsequently (repeatedly) updated. The reporting type queries after that are basically searches on those tables. Anybody any ideas? This is caused by updating the commit status hint bits on each row touched by the SELECTs. This turns the first SELECT into a write operation. Try running a scan of the whole table to take the hit before you give it back to the users. Thanks Simon. I didn't know about the cause, but I expected the answer to be 'deal with it', as it is. At least I can explain it now... Best Regards, Simon Riggs jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Hanging queries on dual CPU windows
On Friday 10 March 2006 04:20, Magnus Hagander wrote: Is it possible to get a stack trace from the stuck process? I dunno if you've got anything gdb-equivalent under Windows, but that's the first thing I'd be interested in ... Here ya go: http://www.devisser-siderius.com/stack1.jpg http://www.devisser-siderius.com/stack2.jpg http://www.devisser-siderius.com/stack3.jpg There are three threads in the process. I guess thread 1 (stack1.jpg) is the most interesting. I also noted that cranking up concurrency in my app reproduces the problem in about 4 minutes ;-) Just reproduced again. Actually, stack2 looks very interesting. Does it stay stuck in pg_queue_signal? That's really not supposed to happen. Yes it does. Also, can you confirm that stack1 actually *stops* in pgwin32_waitforsinglesocket? Or does it go out and come back? ;-) (A good signal of this is to check the cswitch delta. If it stays at zero, then it's stuck. If it shows any values, that means it's actuall going out and coming back) I only see CSwitch change once I click OK on the thread window. Once I do that, it goes up to 3 and back to blank again. The 'context switches' counter does not increase like it does for other processes (like e.g. process explorer itself). Another thing which may or may not be of interest: Nothing is listed in the 'TCP/IP' tab for the stuck process. I would have expected to see at least the socket of the client connection there?? And finally, is this 8.0 or 8.1? There have been some significant changes in the handling of the signals between the two... This is 8.1.3 on Windows 2003 Server. Also reproduced on 8.1.0 and 8.1.1 (also on 2K3). //Magnus jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Hanging queries on dual CPU windows
On Friday 10 March 2006 09:03, Jan de Visser wrote: On Friday 10 March 2006 04:20, Magnus Hagander wrote: Is it possible to get a stack trace from the stuck process? I dunno if you've got anything gdb-equivalent under Windows, but that's the first thing I'd be interested in ... Here ya go: http://www.devisser-siderius.com/stack1.jpg http://www.devisser-siderius.com/stack2.jpg http://www.devisser-siderius.com/stack3.jpg There are three threads in the process. I guess thread 1 (stack1.jpg) is the most interesting. I also noted that cranking up concurrency in my app reproduces the problem in about 4 minutes ;-) Just reproduced again. Actually, stack2 looks very interesting. Does it stay stuck in pg_queue_signal? That's really not supposed to happen. Yes it does. An update on that: There is actually *two* processes in this state, both hanging in pg_queue_signal. I've looked at the source of that, and the obvious candidate for hanging is EnterCriticalSection. I also found this: http://blogs.msdn.com/larryosterman/archive/2005/03/02/383685.aspx where they say: In addition, for Windows 2003, SP1, the EnterCriticalSection API has a subtle change that's intended tor resolve many of the lock convoy issues. Before Win2003 SP1, if 10 threads were blocked on EnterCriticalSection and all 10 threads had the same priority, then EnterCriticalSection would service those threads in a FIFO (first -in, first-out) basis. Starting in Windows 2003 SP1, the EnterCriticalSection will wake up a random thread from the waiting threads. If all the threads are doing the same thing (like a thread pool) this won't make much of a difference, but if the different threads are doing different work (like the critical section protecting a widely accessed object), this will go a long way towards removing lock convoy semantics. Could it be they broke it when they did that Also, can you confirm that stack1 actually *stops* in pgwin32_waitforsinglesocket? Or does it go out and come back? ;-) (A good signal of this is to check the cswitch delta. If it stays at zero, then it's stuck. If it shows any values, that means it's actuall going out and coming back) I only see CSwitch change once I click OK on the thread window. Once I do that, it goes up to 3 and back to blank again. The 'context switches' counter does not increase like it does for other processes (like e.g. process explorer itself). Another thing which may or may not be of interest: Nothing is listed in the 'TCP/IP' tab for the stuck process. I would have expected to see at least the socket of the client connection there?? And finally, is this 8.0 or 8.1? There have been some significant changes in the handling of the signals between the two... This is 8.1.3 on Windows 2003 Server. Also reproduced on 8.1.0 and 8.1.1 (also on 2K3). //Magnus jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hanging queries on dual CPU windows
On Friday 10 March 2006 09:32, Jan de Visser wrote: Actually, stack2 looks very interesting. Does it stay stuck in pg_queue_signal? That's really not supposed to happen. Yes it does. An update on that: There is actually *two* processes in this state, both hanging in pg_queue_signal. I've looked at the source of that, and the obvious candidate for hanging is EnterCriticalSection. I also found this: http://blogs.msdn.com/larryosterman/archive/2005/03/02/383685.aspx where they say: In addition, for Windows 2003, SP1, the EnterCriticalSection API has a subtle change that's intended tor resolve many of the lock convoy issues. Before Win2003 SP1, if 10 threads were blocked on EnterCriticalSection and all 10 threads had the same priority, then EnterCriticalSection would service those threads in a FIFO (first -in, first-out) basis. Starting in Windows 2003 SP1, the EnterCriticalSection will wake up a random thread from the waiting threads. If all the threads are doing the same thing (like a thread pool) this won't make much of a difference, but if the different threads are doing different work (like the critical section protecting a widely accessed object), this will go a long way towards removing lock convoy semantics. Could it be they broke it when they did that See also this: http://bugs.mysql.com/bug.php?id=12071 It appears the mysql people ran into this and concluded it is a Windows bug they needed to work around. jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hanging queries on dual CPU windows
On Friday 10 March 2006 10:11, Magnus Hagander wrote: Could it be they broke it when they did that In theory, yes, but it still seems a bit far fetched :-( Well, I rolled back SP1 and am running my test again. Looking much better, hasn't locked up in 45mins now, whereas before it would lock up within 5mins. So I think they broke something. jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hanging queries on dual CPU windows
On Friday 10 March 2006 13:25, Magnus Hagander wrote: Could it be they broke it when they did that In theory, yes, but it still seems a bit far fetched :-( Well, I rolled back SP1 and am running my test again. Looking much better, hasn't locked up in 45mins now, whereas before it would lock up within 5mins. So I think they broke something. Wow. I guess I was lucky that I didn't say it was impossible :-) But what really is happening. What other thread is actually holding the critical section at this point, causing us to block? The only places it gets held is while looping the signal queue, but it is released while calling the signal function itself... But they obviously *have* been messing with critical sections, so maybe they accidentally changed something else as well... What bothers me is that nobody else has reported this. It could be that this was exposed by the changes to the signal handling done for 8.1, and the ppl with this level of concurrency are either still on 8.0 or just not on SP1 for their windows boxes yet... Do you have any other software installed on the machine? That might possibly interfere in some way? Just a JDK, JBoss, cygwin (running sshd), and a VNC Server. I don't think that interferes. But let's have it run for a bit longer to confirm this does help. I turned it off after 2.5hr. The longest I had to wait before, with less load, was 1.45hr. If so, we could perhaps recode that part using a Mutex instead of a critical section - since it's not a performance critical path, the difference shouldn't be large. If I code up a patch for that, can you re-apply SP1 and test it? Or is this a production system you can't really touch? I can do whatever the hell I want with it, so if you could cook up a patch that would be great. As a BTW: I reinstalled SP1 and turned stats collection off. That also seems to work, but is not really a solution since we want to use autovacuuming. //Magnus jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hanging queries on dual CPU windows
On Friday 10 March 2006 14:27, Jan de Visser wrote: As a BTW: I reinstalled SP1 and turned stats collection off. That also seems to work, but is not really a solution since we want to use autovacuuming. I lied. I hangs now. Just takes a lot longer... jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hanging queries on dual CPU windows
I have more information on this issue. First of, the problem now happens after about 1-2 hours, as opposed to the 6-8 I mentioned earlier. Yey for shorter test cycles. Furtermore, it does not happen on Linux machines, both single CPU and dual CPU, nor on single CPU windows machines. We can only reproduce on a dual CPU windows machine, and if we take one CPU out, it does not happen. I executed the following after it hung: db=# select l.pid, c.relname, l.mode, l.granted, l.page, l.tuple from pg_locks l, pg_class c where c.oid = l.relation order by l.pid; Which showed me that several transactions where waiting for a particular row which was locked by another transaction. This transaction had no pending locks (so no deadlock), but just does not complete and hence never relinquishes the lock. What gives? has anybody ever heard of problems like this on dual CPU windows machines? jan On Monday 06 March 2006 09:38, Jan de Visser wrote: Hello, While doing performance tests on Windows Server 2003 we observed to following two problems. Environment: J2EE application running in JBoss application server, against pgsql 8.1 database. Load is caused by a smallish number of (very) complex transactions, typically about 5-10 concurrently. The first one, which bothers me the most, is that after about 6-8 hours the application stops processing. No errors are reported, neither by the JDBC driver nor by the server, but when I kill the application server, I see that all my connections hang in a SQL statements (which never seem to return): 2006-03-03 08:17:12 4504 6632560 LOG: duration: 45087000.000 ms statement: EXECUTE unnamed [PREPARE: SELECT objID FROM objects WHERE objID = $1 FOR UPDATE] I think I can reliably reproduce this by loading the app, and waiting a couple of hours. -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hanging queries on dual CPU windows
On Thursday 09 March 2006 15:10, Tom Lane wrote: Jan de Visser [EMAIL PROTECTED] writes: Furtermore, it does not happen on Linux machines, both single CPU and dual CPU, nor on single CPU windows machines. We can only reproduce on a dual CPU windows machine, and if we take one CPU out, it does not happen. ... Which showed me that several transactions where waiting for a particular row which was locked by another transaction. This transaction had no pending locks (so no deadlock), but just does not complete and hence never relinquishes the lock. Is the stuck transaction still consuming CPU time, or just stopped? CPU drops off. In fact, that's my main clue something's wrong ;-) Is it possible to get a stack trace from the stuck process? I dunno if you've got anything gdb-equivalent under Windows, but that's the first thing I'd be interested in ... I wouldn't know. I'm hardly a windows expert. Prefer not to touch the stuff, myself. Can do some research though... regards, tom lane jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hanging queries on dual CPU windows
On Thursday 09 March 2006 15:10, Tom Lane wrote: Is it possible to get a stack trace from the stuck process? I dunno if you've got anything gdb-equivalent under Windows, but that's the first thing I'd be interested in ... Here ya go: http://www.devisser-siderius.com/stack1.jpg http://www.devisser-siderius.com/stack2.jpg http://www.devisser-siderius.com/stack3.jpg There are three threads in the process. I guess thread 1 (stack1.jpg) is the most interesting. I also noted that cranking up concurrency in my app reproduces the problem in about 4 minutes ;-) With thanks to Magnus Hagander for the Process Explorer hint. jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Hanging queries and I/O exceptions
Hello, While doing performance tests on Windows Server 2003 we observed to following two problems. Environment: J2EE application running in JBoss application server, against pgsql 8.1 database. Load is caused by a smallish number of (very) complex transactions, typically about 5-10 concurrently. The first one, which bothers me the most, is that after about 6-8 hours the application stops processing. No errors are reported, neither by the JDBC driver nor by the server, but when I kill the application server, I see that all my connections hang in a SQL statements (which never seem to return): 2006-03-03 08:17:12 4504 6632560 LOG: duration: 45087000.000 ms statement: EXECUTE unnamed [PREPARE: SELECT objID FROM objects WHERE objID = $1 FOR UPDATE] I think I can reliably reproduce this by loading the app, and waiting a couple of hours. The second problem is less predictable: JDBC exception: An I/O error occured while sending to the backend. org.postgresql.util.PSQLException: An I/O error occured while sending to the backend. at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:214) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250) In my server log, I have: 2006-03-02 12:31:02 5692 6436342 LOG: could not receive data from client: A non-blocking socket operation could not be completed immediately. At the time my box is fairly heavy loaded, but still responsive. Server and JBoss appserver live on the same dual 2Ghz Opteron. A quick Google told me that: 1. More people have seen this. 2. No solutions. 3. The server message appears to indicate an unhandled WSAEWOULDBLOCK winsock error on recv(), which MSDN said is to be expected and should be retried. Is this a known bug? jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq