Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Staale Smedseng
On Thu, 2008-02-07 at 18:12, Simon Riggs wrote: I just realised you are using a lookup to get the text for the name of the lock. You used the same lookup table for both releases? Oh, it wasn't quite that bad. :-) The two DTrace scripts had been revised to correspond with the two different

Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Jignesh K. Shah
I dont think my earlier message got through.. We use separate lookup tables for 825 and 83 based on the respective lwlock.h for that version. -Jignesh Simon Riggs wrote: On Thu, 2008-02-07 at 16:29 +0100, Staale Smedseng wrote: On Wed, 2008-02-06 at 19:55, Tom Lane wrote: I am

Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Simon Riggs
On Thu, 2008-02-07 at 16:29 +0100, Staale Smedseng wrote: On Wed, 2008-02-06 at 19:55, Tom Lane wrote: I am wondering if the waits are being attributed to the right locks --- I remember such an error in a previous set of dtrace results, and some of the other details such as claiming

Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Tom Lane
Staale Smedseng [EMAIL PROTECTED] writes: Good catch. We've checked the DTrace scripts against the respective versions of lwlock.h, and the FirstLockMgrLock is off (this is actually the results for FirstBufMappingLock). However, this is the last lock in the enum that we trace, the other

Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Gregory Stark
Jignesh K. Shah [EMAIL PROTECTED] writes: for about 500users : For about 700 Users: At 1000 users This is a tangent but are these actual Postgres processes? What's the logic behind trying to run a 1,000 processes on a box with 16 cpus? They're all just going to be queuing up for i/o requests

Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Jignesh K. Shah
Last try for the script/results (truncating less significant portions of output which are too big) Staale Smedseng wrote: her locks should have been output correctly, however. But as Tom pointed out, the dynamic locks were not in the equation. So now we're measuring all lock waits instead

Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: This is a tangent but are these actual Postgres processes? What's the logic behind trying to run a 1,000 processes on a box with 16 cpus? We should certainly be careful about trying to eliminate contention in this scenario at the cost of making things

Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Staale Smedseng
On Wed, 2008-02-06 at 19:55, Tom Lane wrote: I am wondering if the waits are being attributed to the right locks --- I remember such an error in a previous set of dtrace results, and some of the other details such as claiming shared lock delays but no exclusive lock delays for FirstLockMgrLock

Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Jignesh K. Shah
Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: This is a tangent but are these actual Postgres processes? What's the logic behind trying to run a 1,000 processes on a box with 16 cpus? We should certainly be careful about trying to eliminate contention in this scenario at

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Staale Smedseng
On Mon, 2008-02-04 at 19:46, Simon Riggs wrote: We've got various facilities now for looking at LWLock waits, but I'd like to have more information about the *reasons* for lock waits. I know its possible to get backtraces in Dtrace at various tracepoints but that can be fairly hard to

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Simon Riggs
On Wed, 2008-02-06 at 15:30 +0100, Staale Smedseng wrote: On Mon, 2008-02-04 at 19:46, Simon Riggs wrote: We've got various facilities now for looking at LWLock waits, but I'd like to have more information about the *reasons* for lock waits. I know its possible to get backtraces in

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Gregory Stark
Staale Smedseng [EMAIL PROTECTED] writes: The stack trace shows that the only time the lock is acquired exclusively is from the call to ProcArrayEndTransaction() in CommitTransaction(). I'm not sure but I think that's only true in 8.3. As I understood it in 8.2 transaction start also needed

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Staale Smedseng
What is the frequency distribution of lock wait time on ProcArrayLock? See below for wait time distributions for ProcArrayLock (both shared and exclusive). The time measured is from entry into LWLockAcquire() until return. I've recorded the same data in two different resolutions (ms, and us for

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Simon Riggs
On Wed, 2008-02-06 at 18:44 +0100, Staale Smedseng wrote: What is the frequency distribution of lock wait time on ProcArrayLock? See below for wait time distributions for ProcArrayLock (both shared and exclusive). The time measured is from entry into LWLockAcquire() until return. I've

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Staale Smedseng
I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We could have a problem with cache line aliasing on only one or the other for example. Agreed, this is likely comparing apples and oranges. I'll see if I can get a one-to-one comparison done (these were the numbers

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes: Staale Smedseng [EMAIL PROTECTED] writes: Also, an interesting observation is that the hot locks seem to have changed from v8.2 to v8.3, making the ProcArrayLock more contended. See the following outputs: PostgreSQL 8.2 (32-bit): ... PostgreSQL 8.3

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Simon Riggs
On Wed, 2008-02-06 at 13:55 -0500, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Staale Smedseng [EMAIL PROTECTED] writes: Also, an interesting observation is that the hot locks seem to have changed from v8.2 to v8.3, making the ProcArrayLock more contended. See the following

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: There were only 2 lock delays for FirstLockMgrLock in SHARED mode, so it seems believable that there were 0 lock delays in EXCLUSIVE mode. Not really, considering the extremely limited use of LW_SHARED in lock.c (GetLockConflicts is used only by CREATE

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Simon Riggs
On Wed, 2008-02-06 at 14:42 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: There were only 2 lock delays for FirstLockMgrLock in SHARED mode, so it seems believable that there were 0 lock delays in EXCLUSIVE mode. Not really, considering the extremely limited use of LW_SHARED

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: On Wed, 2008-02-06 at 14:42 -0500, Tom Lane wrote: Not really, considering the extremely limited use of LW_SHARED in lock.c (GetLockConflicts is used only by CREATE INDEX CONCURRENTLY, and GetLockStatusData only by the pg_locks view). For the type of

Re: [HACKERS] Why are we waiting?

2008-02-05 Thread Simon Riggs
On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote: Basically I'd rather try to attack the problem with dtrace ... OK. I'll switch to Solaris. Or do you something I don't about dtrace on linux? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of

Re: [HACKERS] Why are we waiting?

2008-02-05 Thread Heikki Linnakangas
Simon Riggs wrote: On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote: Basically I'd rather try to attack the problem with dtrace ... OK. I'll switch to Solaris. Or do you something I don't about dtrace on linux? One idea would be to add new arguments to LWLockAcquire as you suggest, but

Re: [HACKERS] Why are we waiting?

2008-02-05 Thread Simon Riggs
On Tue, 2008-02-05 at 14:14 +, Heikki Linnakangas wrote: Simon Riggs wrote: On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote: Basically I'd rather try to attack the problem with dtrace ... OK. I'll switch to Solaris. Or do you something I don't about dtrace on linux? One

Re: [HACKERS] Why are we waiting?

2008-02-05 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2008-02-04 at 17:06 -0500, Tom Lane wrote: Basically I'd rather try to attack the problem with dtrace ... OK. I'll switch to Solaris. Or do you something I don't about dtrace on linux? Nope :-(. The SystemTap guys keep promising support for

[HACKERS] Why are we waiting?

2008-02-04 Thread Simon Riggs
We've got various facilities now for looking at LWLock waits, but I'd like to have more information about the *reasons* for lock waits. I know its possible to get backtraces in Dtrace at various tracepoints but that can be fairly hard to interpret. I'm thinking of adding an extra parameter onto

Re: [HACKERS] Why are we waiting?

2008-02-04 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: I'm thinking of adding an extra parameter onto every call to LockBuffer() and LWLockAcquire() to explain the reason for the lock request. Maybe I'm missing something, but I don't see what this would buy us, except for being able to track which call site

[HACKERS] Why are we waiting? Thoughts on Further Scalability

2007-07-26 Thread Simon Riggs
I've been thinking some more about scalability and what we need to measure in order to locate and remove the next set of bottlenecks. EXCLUSIVE LOCKS The lock wait time distribution and the sum of lock held time is of interest in understanding contention. SHARED LOCKS Shared locks present some