Re: [PERFORM] Postgresql is very slow

2008-06-23 Thread Jan de Visser
On Monday 23 June 2008 07:06:54 bijayant kumar wrote:
 Hello to list,

 We have a CentOS-5 server with postgresql-8.1.8 installed. I am struggling
 with postgresql performance. Any query say select * from tablename takes
 10-15 mins to give the output, and while executing the query system loads
 goes up like anything. After the query output, system loads starts
 decresing.

Sounds like a vacuum problem.


 Any query select,insert,update simple or complex behaves in the same way,
 what i have explained above.

 System Specification:

 OS :- CentOs 5
 Postgresql 8.1.8
 RAM :- 1 GB
 SWAP 2 GB

 Some relevent part(uncommented) of my  /var/lib/pgsql/data/postgresql.conf

 listen_addresses = 'localhost'
 max_connections = 100
 shared_buffers = 1000

You shared_buffers seems low.


 The one more strange thing is that with the same setting on another server,
 postgresql is running very smooth. I had run vacum also some times back.

You are aware that vacuum is supposed to be an ongoing maintenance activity, 
right? 


 Please help me out and let me know if you need any other information.

 Thanks  Regards,

 Bijayant Kumar

 Send instant messages to your online friends http://uk.messenger.yahoo.com

jan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O on select count(*)

2008-05-15 Thread Jan de Visser
On Thursday 15 May 2008 03:02:19 Tom Lane wrote:
 Jan de Visser [EMAIL PROTECTED] writes:
  Obviously, this issue is tied to the slow count(*) one, as I found out
  the hard way. Consider the following scenario:
  * Insert row
  * Update that row a couple of times
  * Rinse and repeat many times
 
  Now somewhere during that cycle, do a select count(*) just to see
  where you are. You will be appalled by how slow that is, due to not
  only the usual 'slow count(*)' reasons. This whole hint bit business
  makes it even worse, as demonstrated by the fact that running a vacuum
  before the count(*) makes the latter noticably faster.

 Uh, well, you can't blame that entirely on hint-bit updates.  The vacuum
 has simply *removed* two-thirds of the rows in the system, resulting in
 a large drop in the number of rows that the select even has to look at.

 It's certainly true that hint-bit updates cost something, but
 quantifying how much isn't easy.  The off-the-cuff answer is to do the
 select count(*) twice and see how much cheaper the second one is.  But
 there are two big holes in that answer: the first is the possible cache
 effects from having already read in the pages, and the second is that
 the follow-up scan gets to avoid the visits to pg_clog that the first
 scan had to make (which after all is the point of the hint bits).

 I don't know any easy way to disambiguate the three effects that are at
 work here.  But blaming it all on the costs of writing out hint-bit
 updates is wrong.

   regards, tom lane

True. But it still contributes to the fact that queries sometimes behave in a 
non-deterministic way, which IMHO is the major annoyance when starting to 
work with pgsql. And contrary to other causes (vacuum, checkpoints) this is 
woefully underdocumented.

jan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O on select count(*)

2008-05-14 Thread Jan de Visser
On 5/14/08, Greg Smith [EMAIL PROTECTED] wrote:
 On Wed, 14 May 2008, Alvaro Herrera wrote:


  If neither of the bits is set, then the transaction is either in progress
 (which you can check by examining the list of running transactions in shared
 memory) or your process is the first one to check (in which case, you need
 to consult pg_clog to know the status, and you can update the hint bits if
 you find out a permanent state).
 

  So is vacuum helpful here because it will force all that to happen in one
 batch?  To put that another way:  if I've run a manual vacuum, is it true
 that it will have updated all the hint bits to XMIN_COMMITTED for all the
 tuples that were all done when the vacuum started?

From my benchmarking experience: Yes, vacuum helps. See also below.



  Regarding FAQs, I'm having trouble imagining putting this in the user
  FAQ; I think it belongs into the developer's FAQ.  However, a
  benchmarker is not going to look there.  Maybe we should start a
  benchmarker's FAQ?
 

  On the wiki I've started adding a series of things that are
 performance-related FAQs.  There's three of them mixed in the bottom of
 http://wiki.postgresql.org/wiki/Frequently_Asked_Questions
 right now, about slow count(*) and dealing with slow queries.

  Here the FAQ would be Why am I seeing all these writes when I'm just doing
 selects on my table?, and if it's mixed in with a lot of other performance
 related notes people should be able to find it.  The answer and suggestions
 should be simple enough to be useful to a user who just noticed this
 behavior, while perhaps going into developer land for those who want to know
 more about the internals.

Obviously, this issue is tied to the slow count(*) one, as I found out
the hard way. Consider the following scenario:
* Insert row
* Update that row a couple of times
* Rinse and repeat many times

Now somewhere during that cycle, do a select count(*) just to see
where you are. You will be appalled by how slow that is, due to not
only the usual 'slow count(*)' reasons. This whole hint bit business
makes it even worse, as demonstrated by the fact that running a vacuum
before the count(*) makes the latter noticably faster.

jan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-20 Thread Jan de Visser
On 12/20/07, Tom Lane [EMAIL PROTECTED] wrote:
 Scott Marlowe [EMAIL PROTECTED] writes:
  Well, I'm not even sure if those got applied or were just Tom hacking
  in the basement or, heck, my fevered imagination. :)

 For the record, I hack in the attic ... or what I tell the IRS is my
 third-floor office ...


Awesome band name - 'Hacking in the Attic'

:)

jan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Question about memory allocations

2007-04-13 Thread Jan de Visser
On Friday 13 April 2007 14:53:53 Carlos Moreno wrote:
 How does PG take advantage of the available memory?  I mean, if I have a
 machine with, say, 4 or 8GB of memory, how will those GBs would end
 up being used?   They just do??   (I mean, I would find that a vaild
 answer;

On linux the filesystem cache will gobble them up, which means indirectly 
pgsql profits as well (assuming no other apps poison the fs cache).

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Beginner Question

2007-04-11 Thread Jan de Visser
On Monday 09 April 2007 05:09:53 s d wrote:
 Hi,
 I am trying to figure out how to debug a performance problem / use psql
 explain. The table in question is:
 # \d word_association;
Table public.word_association
  Column |  Type  | Modifiers
 ++
  word1  | character varying(128) | not null
  word2  | character varying(128) | not null
  count  | integer| not null default 0
 Indexes:
 word1_word2_comb_unique unique, btree (word1, word2)
 word1_hash_index hash (word1)
 word2_hash_index hash (word2)
 word_association_count_index btree (count)
 word_association_index1_1 btree (word1)
 word_association_index2_1 btree (word2)

 It has multiple indices since i wanted to see which one the planner choses.


 # explain select * FROM word_association WHERE (word1 = 'bdss' OR
 word2 = 'bdss')  AND count = 10;
QUERY PLAN
 ---
- Bitmap Heap Scan on word_association 
 (cost=11.53..1192.09 rows=155 width=22) Recheck Cond: (((word1)::text =
 'bdss'::text) OR ((word2)::text = 'bdss'::text))
Filter: (count = 10)
-  BitmapOr  (cost=11.53..11.53 rows=364 width=0)
  -  Bitmap Index Scan on word_association_index1_1
 (cost=0.00..5.79 rows=190 width=0)
Index Cond: ((word1)::text = 'bdss'::text)
  -  Bitmap Index Scan on word_association_index2_1
 (cost=0.00..5.67 rows=174 width=0)
Index Cond: ((word2)::text = 'bdss'::text)
 (8 rows)

 The questions:
 1. i can undestand where the cost=11.53 came from but where did the
 1192.09 come form? The values are in milli right ?
 2. the query takes  in reality much longer than 1 second.

 In short, it feels like something is very wrong here (i tried vacuum
 analyze and it didn't do much diff).
 any ideas ?

You need an index on (word1, word2, count). In your current setup it will have 
to scan all rows that satisfy word1 and word2 to see if count = 10.

jan


-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] App very unresponsive while performing simple update

2006-05-31 Thread Jan de Visser
On Wednesday 31 May 2006 02:29, Brendan Duddridge wrote:
 We'll probably have to write a process to update the click_count from  
 querying our product_click_history table.

How about an insert trigger on product_click_history which updates click_count 
every say 1 transactions or so?

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] App very unresponsive while performing simple update

2006-05-31 Thread Jan de Visser
On Wednesday 31 May 2006 13:34, Brendan Duddridge wrote:
 Hi Jan,

 That sounds like a great idea! How would you control the update to
 occur only every 10,000 transactions?

 Is there a trigger setting for that somewhere?

I was thinking something like

IF count(*) % 1 = 0 then
  ... do stuff ...
end if

Problem may be that that may be a bit expensive; maybe better to have a 
sequence and use the sequence value. 

Or something like that.

Also, maybe you should do the actual update of click_count not in the trigger 
itself, but have the trigger do a NOTIFY and have another process do a 
LISTEN. Depends how long the update takes.

jan


 Thanks,

 
 Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

 ClickSpace Interactive Inc.
 Suite L100, 239 - 10th Ave. SE
 Calgary, AB  T2G 0V9

 http://www.clickspace.com

 On May 31, 2006, at 6:34 AM, Jan de Visser wrote:
  On Wednesday 31 May 2006 02:29, Brendan Duddridge wrote:
  We'll probably have to write a process to update the click_count from
  querying our product_click_history table.
 
  How about an insert trigger on product_click_history which updates
  click_count
  every say 1 transactions or so?
 
  jan
 
  --
  --
  Jan de Visser [EMAIL PROTECTED]
 
  Baruk Khazad! Khazad ai-menu!
  --
 
  ---(end of
  broadcast)---
  TIP 3: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/docs/faq

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query runs 38 seconds for small database!

2006-05-08 Thread Jan de Visser
On Monday 08 May 2006 14:10, Andrus wrote:
  The only reason for being so conservative that I'm aware of was that it
  was a best guess. Everyone I've talked to cuts the defaults down by at
  least a factor of 2, sometimes even more.

 Can we ask that Tom will change default values to 2 times smaller in 8.1.4
 ?

  BTW, these parameters are already tweaked from what we started with in
  contrib/pg_autovacuum. It would allow a table to grow to 2x larger than
  it should be before vacuuming, as opposed to the 40% that the current
  settings allow. But even there, is there any real reason you want to
  have 40% bloat? To make matters worse, those settings ensure that all
  but the smallest databases will suffer runaway bloat unless you bump up

  recprd the FSM settings.

 I created empty table konto and loaded more that 219 records to it during
 database creation.
 So it seems that if table grows from zero to more than 219 times larger
 then it was still not processed.

That's because you need at least 500 rows for analyze and 100 for a vacuum, 
(autovacuum_vacuum_threshold = 1000, autovacuum_analyze_threshold = 500).


 Andrus.

jan




 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance Issues on Opteron Dual Core

2006-05-03 Thread Jan de Visser
On Wednesday 03 May 2006 03:29, Magnus Hagander wrote:
FWIW, I've found problems running PostgreSQL on Windows in a
multi-CPU environment on w2k3. It runs fine for some period, and
then CPU and throughput drop to zero. So far I've been unable to
track down any more information than that, other than the
 
  fact that
 
I haven't been able to reproduce this on any single-CPU machines.
  
   I have had previous correspondence about this with Magnus (search
   -general and -hackers). If you uninstall SP1 the problem
 
  goes away. We
 
   played a bit with potential fixes but didn't find any.
 
  Interesting; does SP2 fix the problem? Anything we can do
  over here to help?

 There is no SP2 for Windows 2003.

That's what I thought. Jim confused me there for a minute.


 Have you tried this with latest-and-greatest CVS HEAD? Meaning with the
 new semaphore code that was committed a couple of days ago?

No I haven't. Worth a test on a rainy afternoon I'd say...


 //Magnus

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance Issues on Opteron Dual Core

2006-05-02 Thread Jan de Visser
On Tuesday 02 May 2006 16:28, Jim C. Nasby wrote:
 On Sun, Apr 30, 2006 at 10:59:56PM +1200, Mark Kirkwood wrote:
  Pgadmin can give misleading times for queries that return large result
  sets over a network, due to:
 
  1/ It takes time to format the (large) result set for display.
  2/ It has to count the time spent waiting for the (large) result set to
  travel across the network.
 
  You aren't running Pgadmin off the dev server are you? If not check your
  network link to dev and prod  - is one faster than the other? (etc).
 
  To eliminate Pgadmin and the network as factors try wrapping your query
  in a 'SELECT count(*) FROM (your query here) AS a', and see if it
  changes anything!

 FWIW, I've found problems running PostgreSQL on Windows in a multi-CPU
 environment on w2k3. It runs fine for some period, and then CPU and
 throughput drop to zero. So far I've been unable to track down any more
 information than that, other than the fact that I haven't been able to
 reproduce this on any single-CPU machines.

I have had previous correspondence about this with Magnus (search -general 
and -hackers). If you uninstall SP1 the problem goes away. We played a bit 
with potential fixes but didn't find any.

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Slow SELECTS after large update cycle

2006-03-15 Thread Jan de Visser
Hello,

After fixing the hanging problems I reported here earlier (by uninstalling 
W2K3 SP1), I'm running into another weird one.

After doing a +/- 8hr cycle of updates and inserts (what we call a 'batch'), 
the first 'reporting' type query on tables involved in that write cycle is 
very slow. As an example, I have a query which according to EXPLAIN ANALYZE 
takes about 1.1s taking 46s. After this one hit, everything is back to 
normal, and subsequent executions of the same query are in fact subsecond. 
Restarting the appserver and pgsql does not make the slowness re-appear, only 
running another batch will.

During the 'write'/batch cycle, a large number of rows in various tables are 
inserted and subsequently (repeatedly) updated. The reporting type queries 
after that are basically searches on those tables.

Anybody any ideas?

Thanks,

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Slow SELECTS after large update cycle

2006-03-15 Thread Jan de Visser
On Wednesday 15 March 2006 18:21, Simon Riggs wrote:
 On Wed, 2006-03-15 at 14:39 -0500, Jan de Visser wrote:
  After fixing the hanging problems I reported here earlier (by
  uninstalling W2K3 SP1), I'm running into another weird one.
 
  After doing a +/- 8hr cycle of updates and inserts (what we call a
  'batch'), the first 'reporting' type query on tables involved in that
  write cycle is very slow. As an example, I have a query which according
  to EXPLAIN ANALYZE takes about 1.1s taking 46s. After this one hit,
  everything is back to normal, and subsequent executions of the same query
  are in fact subsecond. Restarting the appserver and pgsql does not make
  the slowness re-appear, only running another batch will.
 
  During the 'write'/batch cycle, a large number of rows in various tables
  are inserted and subsequently (repeatedly) updated. The reporting type
  queries after that are basically searches on those tables.
 
  Anybody any ideas?

 This is caused by updating the commit status hint bits on each row
 touched by the SELECTs. This turns the first SELECT into a write
 operation.

 Try running a scan of the whole table to take the hit before you give it
 back to the users.

Thanks Simon. I didn't know about the cause, but I expected the answer to be 
'deal with it', as it is. At least I can explain it now...


 Best Regards, Simon Riggs

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Hanging queries on dual CPU windows

2006-03-10 Thread Jan de Visser
On Friday 10 March 2006 04:20, Magnus Hagander wrote:
   Is it possible to get a stack trace from the stuck process?
 
   I dunno
 
   if you've got anything gdb-equivalent under Windows, but that's the
   first thing I'd be interested in ...
 
  Here ya go:
 
  http://www.devisser-siderius.com/stack1.jpg
  http://www.devisser-siderius.com/stack2.jpg
  http://www.devisser-siderius.com/stack3.jpg
 
  There are three threads in the process. I guess thread 1
  (stack1.jpg) is the most interesting.
 
  I also noted that cranking up concurrency in my app
  reproduces the problem in about 4 minutes ;-)


Just reproduced again. 

 Actually, stack2 looks very interesting. Does it stay stuck in
 pg_queue_signal? That's really not supposed to happen.

Yes it does. 


 Also, can you confirm that stack1 actually *stops* in
 pgwin32_waitforsinglesocket? Or does it go out and come back? ;-)

 (A good signal of this is to check the cswitch delta. If it stays at zero,
 then it's stuck. If it shows any values, that means it's actuall going out
 and coming back)

I only see CSwitch change once I click OK on the thread window. Once I do 
that, it goes up to 3 and back to blank again. The 'context switches' counter 
does not increase like it does for other processes (like e.g. process 
explorer itself).

Another thing which may or may not be of interest: Nothing is listed in the 
'TCP/IP' tab for the stuck process. I would have expected to see at least the 
socket of the client connection there??


 And finally, is this 8.0 or 8.1? There have been some significant changes
 in the handling of the signals between the two...

This is 8.1.3 on Windows 2003 Server. Also reproduced on 8.1.0 and 8.1.1 (also 
on 2K3). 


 //Magnus

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hanging queries on dual CPU windows

2006-03-10 Thread Jan de Visser
On Friday 10 March 2006 09:03, Jan de Visser wrote:
 On Friday 10 March 2006 04:20, Magnus Hagander wrote:
Is it possible to get a stack trace from the stuck process?
  
    I dunno
  
if you've got anything gdb-equivalent under Windows, but that's the
first thing I'd be interested in ...
  
   Here ya go:
  
   http://www.devisser-siderius.com/stack1.jpg
   http://www.devisser-siderius.com/stack2.jpg
   http://www.devisser-siderius.com/stack3.jpg
  
   There are three threads in the process. I guess thread 1
   (stack1.jpg) is the most interesting.
  
   I also noted that cranking up concurrency in my app
   reproduces the problem in about 4 minutes ;-)

 Just reproduced again.

  Actually, stack2 looks very interesting. Does it stay stuck in
  pg_queue_signal? That's really not supposed to happen.

 Yes it does.

An update on that: There is actually *two* processes in this state, both 
hanging in pg_queue_signal. I've looked at the source of that, and the 
obvious candidate for hanging is EnterCriticalSection. I also found this:

http://blogs.msdn.com/larryosterman/archive/2005/03/02/383685.aspx

where they say:


In addition, for Windows 2003, SP1, the EnterCriticalSection API has a subtle 
change that's intended tor resolve many of the lock convoy issues.  Before 
Win2003 SP1, if 10 threads were blocked on EnterCriticalSection and all 10 
threads had the same priority, then EnterCriticalSection would service those 
threads in a FIFO (first -in, first-out) basis.  Starting in Windows 2003 
SP1, the EnterCriticalSection will wake up a random thread from the waiting 
threads.  If all the threads are doing the same thing (like a thread pool) 
this won't make much of a difference, but if the different threads are doing 
different work (like the critical section protecting a widely accessed 
object), this will go a long way towards removing lock convoy semantics.


Could it be they broke it when they did that



  Also, can you confirm that stack1 actually *stops* in
  pgwin32_waitforsinglesocket? Or does it go out and come back? ;-)
 
  (A good signal of this is to check the cswitch delta. If it stays at
  zero, then it's stuck. If it shows any values, that means it's actuall
  going out and coming back)

 I only see CSwitch change once I click OK on the thread window. Once I do
 that, it goes up to 3 and back to blank again. The 'context switches'
 counter does not increase like it does for other processes (like e.g.
 process explorer itself).

 Another thing which may or may not be of interest: Nothing is listed in the
 'TCP/IP' tab for the stuck process. I would have expected to see at least
 the socket of the client connection there??

  And finally, is this 8.0 or 8.1? There have been some significant changes
  in the handling of the signals between the two...

 This is 8.1.3 on Windows 2003 Server. Also reproduced on 8.1.0 and 8.1.1
 (also on 2K3).

  //Magnus

 jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hanging queries on dual CPU windows

2006-03-10 Thread Jan de Visser
On Friday 10 March 2006 09:32, Jan de Visser wrote:
   Actually, stack2 looks very interesting. Does it stay stuck in
   pg_queue_signal? That's really not supposed to happen.
 
  Yes it does.

 An update on that: There is actually *two* processes in this state, both
 hanging in pg_queue_signal. I've looked at the source of that, and the
 obvious candidate for hanging is EnterCriticalSection. I also found this:

 http://blogs.msdn.com/larryosterman/archive/2005/03/02/383685.aspx

 where they say:

 
 In addition, for Windows 2003, SP1, the EnterCriticalSection API has a
 subtle change that's intended tor resolve many of the lock convoy issues.
  Before Win2003 SP1, if 10 threads were blocked on EnterCriticalSection and
 all 10 threads had the same priority, then EnterCriticalSection would
 service those threads in a FIFO (first -in, first-out) basis.  Starting in
 Windows 2003 SP1, the EnterCriticalSection will wake up a random thread
 from the waiting threads.  If all the threads are doing the same thing
 (like a thread pool) this won't make much of a difference, but if the
 different threads are doing different work (like the critical section
 protecting a widely accessed object), this will go a long way towards
 removing lock convoy semantics. 

 Could it be they broke it when they did that

See also this:

http://bugs.mysql.com/bug.php?id=12071

It appears the mysql people ran into this and concluded it is a Windows bug 
they needed to work around.

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hanging queries on dual CPU windows

2006-03-10 Thread Jan de Visser
On Friday 10 March 2006 10:11, Magnus Hagander wrote:
  Could it be they broke it when they did that

 In theory, yes, but it still seems a bit far fetched :-(

Well, I rolled back SP1 and am running my test again. Looking much better, 
hasn't locked up in 45mins now, whereas before it would lock up within 5mins.

So I think they broke something.

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Hanging queries on dual CPU windows

2006-03-10 Thread Jan de Visser
On Friday 10 March 2006 13:25, Magnus Hagander wrote:
Could it be they broke it when they did that
  
   In theory, yes, but it still seems a bit far fetched :-(
 
  Well, I rolled back SP1 and am running my test again. Looking
  much better, hasn't locked up in 45mins now, whereas before
  it would lock up within 5mins.
 
  So I think they broke something.

 Wow. I guess I was lucky that I didn't say it was impossible :-)


 But what really is happening. What other thread is actually holding the
 critical section at this point, causing us to block? The only places it
 gets held is while looping the signal queue, but it is released while
 calling the signal function itself...

 But they obviously *have* been messing with critical sections, so maybe
 they accidentally changed something else as well...

 What bothers me is that nobody else has reported this. It could be that
 this was exposed by the changes to the signal handling done for 8.1, and
 the ppl with this level of concurrency are either still on 8.0 or just
 not on SP1 for their windows boxes yet... Do you have any other software
 installed on the machine? That might possibly interfere in some way?

Just a JDK, JBoss, cygwin (running sshd), and a VNC Server. I don't think that 
interferes.


 But let's have it run for a bit longer to confirm this does help. 

I turned it off after 2.5hr. The longest I had to wait before, with less load, 
was 1.45hr.

 If so, 
 we could perhaps recode that part using a Mutex instead of a critical
 section - since it's not a performance critical path, the difference
 shouldn't be large. If I code up a patch for that, can you re-apply SP1
 and test it? Or is this a production system you can't really touch?

I can do whatever the hell I want with it, so if you could cook up a patch 
that would be great.

As a BTW: I reinstalled SP1 and turned stats collection off. That also seems 
to work, but is not really a solution since we want to use autovacuuming.


 //Magnus

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hanging queries on dual CPU windows

2006-03-10 Thread Jan de Visser
On Friday 10 March 2006 14:27, Jan de Visser wrote:
 As a BTW: I reinstalled SP1 and turned stats collection off. That also
 seems to work, but is not really a solution since we want to use
 autovacuuming.

I lied. I hangs now. Just takes a lot longer...

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hanging queries on dual CPU windows

2006-03-09 Thread Jan de Visser
I have more information on this issue.

First of, the problem now happens after about 1-2 hours, as opposed to the 6-8 
I mentioned earlier. Yey for shorter test cycles.

Furtermore, it does not happen on Linux machines, both single CPU and dual 
CPU, nor on single CPU windows machines. We can only reproduce on a dual CPU 
windows machine, and if we take one CPU out, it does not happen.

I executed the following after it hung:

db=# select l.pid, c.relname, l.mode, l.granted, l.page, l.tuple 
from pg_locks l, pg_class c where c.oid = l.relation order by l.pid;

Which showed me that several transactions where waiting for a particular row 
which was locked by another transaction. This transaction had no pending 
locks (so no deadlock), but just does not complete and hence never 
relinquishes the lock.

What gives? has anybody ever heard of problems like this on dual CPU windows 
machines?

jan



On Monday 06 March 2006 09:38, Jan de Visser wrote:
 Hello,

 While doing performance tests on Windows Server 2003 we observed to
 following two problems.

 Environment: J2EE application running in JBoss application server, against
 pgsql 8.1 database. Load is caused by a smallish number of (very) complex
 transactions, typically about 5-10 concurrently.

 The first one, which bothers me the most, is that after about 6-8 hours the
 application stops processing. No errors are reported, neither by the JDBC
 driver nor by the server, but when I kill the application server, I see
 that all my connections hang in a SQL statements (which never seem to
 return):

 2006-03-03 08:17:12 4504 6632560 LOG:  duration: 45087000.000 ms
  statement: EXECUTE unnamed  [PREPARE:  SELECT objID FROM objects WHERE
 objID = $1 FOR UPDATE]

 I think I can reliably reproduce this by loading the app, and waiting a
 couple of hours.

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hanging queries on dual CPU windows

2006-03-09 Thread Jan de Visser
On Thursday 09 March 2006 15:10, Tom Lane wrote:
 Jan de Visser [EMAIL PROTECTED] writes:
  Furtermore, it does not happen on Linux machines, both single CPU and
  dual CPU, nor on single CPU windows machines. We can only reproduce on a
  dual CPU windows machine, and if we take one CPU out, it does not happen.
  ...
  Which showed me that several transactions where waiting for a particular
  row which was locked by another transaction. This transaction had no
  pending locks (so no deadlock), but just does not complete and hence
  never relinquishes the lock.

 Is the stuck transaction still consuming CPU time, or just stopped?

CPU drops off. In fact, that's my main clue something's wrong ;-)


 Is it possible to get a stack trace from the stuck process?  I dunno
 if you've got anything gdb-equivalent under Windows, but that's the
 first thing I'd be interested in ...

I wouldn't know. I'm hardly a windows expert. Prefer not to touch the stuff, 
myself. Can do some research though...


   regards, tom lane

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hanging queries on dual CPU windows

2006-03-09 Thread Jan de Visser
On Thursday 09 March 2006 15:10, Tom Lane wrote:
 Is it possible to get a stack trace from the stuck process?  I dunno
 if you've got anything gdb-equivalent under Windows, but that's the
 first thing I'd be interested in ...

Here ya go:

http://www.devisser-siderius.com/stack1.jpg
http://www.devisser-siderius.com/stack2.jpg
http://www.devisser-siderius.com/stack3.jpg

There are three threads in the process. I guess thread 1 (stack1.jpg) is the 
most interesting.

I also noted that cranking up concurrency in my app reproduces the problem in 
about 4 minutes ;-)

With thanks to Magnus Hagander for the Process Explorer hint.

jan

-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Hanging queries and I/O exceptions

2006-03-06 Thread Jan de Visser
Hello,

While doing performance tests on Windows Server 2003 we observed to following 
two problems.

Environment: J2EE application running in JBoss application server, against 
pgsql 8.1 database. Load is caused by a smallish number of (very) complex 
transactions, typically about 5-10 concurrently.

The first one, which bothers me the most, is that after about 6-8 hours the 
application stops processing. No errors are reported, neither by the JDBC 
driver nor by the server, but when I kill the application server, I see that 
all my connections hang in a SQL statements (which never seem to return):

2006-03-03 08:17:12 4504 6632560 LOG:  duration: 45087000.000 ms  statement: 
EXECUTE unnamed  [PREPARE:  SELECT objID FROM objects WHERE objID = $1 FOR 
UPDATE]

I think I can reliably reproduce this by loading the app, and waiting a couple 
of hours.



The second problem is less predictable:

JDBC exception:

An I/O error occured while sending to the backend.
org.postgresql.util.PSQLException: An I/O error occured while sending to the 
backend.
at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:214)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250)


In my server log, I have:

2006-03-02 12:31:02 5692 6436342 LOG:  could not receive data from client: A 
non-blocking socket operation could not be completed immediately.

At the time my box is fairly heavy loaded, but still responsive. Server and 
JBoss appserver live on the same dual 2Ghz Opteron.

A quick Google told me that:

1. More people have seen this.
2. No solutions.
3. The server message appears to indicate an unhandled WSAEWOULDBLOCK winsock 
error on recv(), which MSDN said is to be expected and should be retried.

Is this a known bug?

jan 


-- 
--
Jan de Visser                     [EMAIL PROTECTED]

                Baruk Khazad! Khazad ai-menu!
--

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq